17:02:23 <mboddu> #startmeeting RELENG (2017-09-21)
17:02:23 <zodbot> Meeting started Thu Sep 21 17:02:23 2017 UTC.  The chair is mboddu. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:02:23 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
17:02:23 <zodbot> The meeting name has been set to 'releng_(2017-09-21)'
17:02:23 <mboddu> #meetingname releng
17:02:23 <zodbot> The meeting name has been set to 'releng'
17:02:23 <mboddu> #chair dgilmore nirik tyll sharkcz masta pbrobinson pingou puiterwijk maxamillion mboddu Kellin
17:02:23 <mboddu> #topic init process
17:02:23 <zodbot> Current chairs: Kellin dgilmore masta maxamillion mboddu nirik pbrobinson pingou puiterwijk sharkcz tyll
17:02:31 <mboddu> threebean: haha
17:02:39 <threebean> :)
17:02:47 <nirik> morning
17:02:50 <puiterwijk> hi
17:03:22 <maxamillion> .hello maxamillion
17:03:23 <zodbot> maxamillion: maxamillion 'Adam Miller' <maxamillion@gmail.com>
17:03:44 * relrod waves
17:06:03 <mboddu> Sorry for some of the people who has to deal with 2 meetings at the same time :(
17:06:13 <Kellin> howdy
17:06:18 <maxamillion> mboddu: it happens
17:07:10 <mboddu> So, lets get started, I have couple of open floor items to start with which might take some time, so if anyone has anything please go ahead and talk about it first and then I will pick my topics
17:07:22 <mboddu> before that, lets get an update over alt arches
17:07:32 <mboddu> #topic Alternative Architectures updates
17:07:47 <mboddu> sharkcz: how is it going on with s390x?
17:10:05 <puiterwijk> mboddu: I think there's no working composes for s390x yet due to sshfs being unstable, NFS being not allowed, and me being pedantic about how to work around the network instability between RDU2 nd PHX2
17:10:08 <puiterwijk> RDU*
17:10:41 <mboddu> puiterwijk: So, do you have a plan or still figuring it out?
17:11:36 <puiterwijk> mboddu: the only plan that other people aren't blocking is making me very, very unhappy, so I'm thinking if there's any way I can make myself less unhappy with it.
17:12:10 <puiterwijk> So.... I have a plan *and* I'm still figuring it out?
17:12:32 <maxamillion> puiterwijk++
17:12:45 * mboddu thinks he should cut off puiterwijk's internet and let him relax and think :D
17:12:56 <mboddu> puiterwijk: okay, thats good news :)
17:14:45 <mboddu> #info There are no working composes for s390x due to sshfs being unstable and NFS is not allowed. puiterwijk is working on it
17:15:04 <mboddu> thanks puiterwijk for the update
17:15:13 <puiterwijk> mboddu: well, that's not entirely correct
17:15:16 <puiterwijk> #undo
17:15:16 <zodbot> Removing item from minutes: INFO by mboddu at 17:14:45 : There are no working composes for s390x due to sshfs being unstable and NFS is not allowed. puiterwijk is working on it
17:15:33 <puiterwijk> #info There are no working composes for s390x due to the network being unstable and NFS is not allowed. puiterwijk is working on it
17:15:45 <puiterwijk> sshfs is only unstable because the *network* is unstable
17:15:57 <mboddu> puiterwijk: thanks for the correction :)
17:16:08 <mboddu> #topic Open Floor
17:16:26 <mboddu> So, anyone has anything to share before I start
17:16:51 <maxamillion> I have a quick note
17:17:07 <maxamillion> I'm going to be working on the pungi+bodhi stuff starting this afternoon, so hopefully we'll have that "soon"
17:17:16 <maxamillion> dustymabe needed to hand it off, so I'm picking that up
17:18:31 <maxamillion> that's it
17:18:33 <maxamillion> :)
17:19:34 <mboddu> #info maxamillion is picking up the pungi+bodhi work from dustymabe
17:19:59 <mboddu> maxamillion: ^ is that true or Dusty will still work on it?
17:20:19 <maxamillion> mboddu: that's true
17:20:40 <mboddu> maxamillion: okay, thanks for the update
17:20:42 <maxamillion> mboddu: dusty is under water with other work and needed to pass it off completely, so I'll be taking it from here
17:20:52 <mboddu> maxamillion: okay
17:21:30 <mboddu> So, I got couple of things
17:22:26 <mboddu> To start with, compose issues and making the composes faster
17:23:07 <mboddu> nirik: Are the netapp issues still persistent?
17:23:25 <nirik> no, everything is back to normal now that I know of.
17:23:53 <nirik> moving back to NFSv4.0 seems to have worked around the slowdown issue.
17:24:01 <nirik> they are still investigating it.
17:24:05 <Kellin> so then why are there still timeouts (per our conversation this morning mboddu )
17:24:21 <nirik> timeouts on what exactly?
17:24:35 <mboddu> Kellin: not timeouts, mounting issues
17:25:02 <mboddu> nirik: ^ where it says it was unable to access certain logs or something
17:25:13 <nirik> well, if you can get me specifics I can look.
17:25:34 <nirik> There was an issue with an aarch64 builder I fixed yesterday...
17:27:58 <mboddu> nirik: I am trying to find one, but you had to remount to fix the issue which you said you already did it before
17:28:47 <nirik> ok, if it was an aarch64 builder, I did remount on one yesterday...
17:29:05 <nirik> but in any case, please let me know if you see that and I can look more. we don't have to hold the meeting to do it now.
17:29:46 <mboddu> Sep 07 15:09:41 <+mboddu>	nirik: I think you fixed an issue something like this one - https://koji.fedoraproject.org/koji/taskinfo?taskID=21704357
17:30:13 <maxamillion> Sep 7th? .. that was forever ago ;)
17:30:41 <mboddu> maxamillion: and I guess it repeated again yesterday
17:30:59 <Kellin> I noted to the folks that are working on it that any speed-up/fix work is blocked by https://taiga.fedorainfracloud.org/project/acarter-fedora-docker-atomic-tooling/epic/809
17:30:59 <puiterwijk> mboddu: "PermissionError: [Errno 13] Permission denied"
17:31:05 <puiterwijk> That is something entirely different...
17:31:16 <Kellin> or rather, to the folks requesting it
17:31:51 <maxamillion> mboddu: ah ok
17:33:15 <mboddu> puiterwijk: yes, but unmounting and remounting it again fixed the issue which is what I guess happened in yesterday's case as well
17:33:28 <puiterwijk> Okay. But that's not a timeout
17:33:38 <mboddu> Nope, its not timeout related
17:34:01 <mboddu> Just nirik has to unmount and remount few times, probably different issues but with same solution
17:34:20 <mboddu> Anyway, I guess its not that consistent to hog the meeting time
17:35:08 <nirik> well, my theory about that issue is:
17:35:16 <mboddu> All, I wanted to know here if there are any storage issues(^ I guessed its still storage related) and is there anything that we can do to speed up the composes
17:35:46 <nirik> the machine boots, eth0 comes up, it does the nfs mount, but it should be using eth1 for that, so it gets a ro mount via eth0. So, just umount/mount after eth1's routes are up seems to fix it.
17:36:44 <nirik> mboddu: well, I think we could update pungi on rawhide composer to the new one + merger threebean's pungi config to enable profiling...
17:36:58 <nirik> that should help us at least see what parts of gather and a few other things are slow.
17:37:11 <mboddu> nirik: okay, my bad, I thought its storage related
17:37:40 <mboddu> nirik: sure, I am waiting on new pungi to merge threebean PR
17:37:42 <Kellin> so for clarification - the issue is related to a ro-mount at boot time; so it only pops up if the machine in question has rebooted?
17:37:55 <nirik> Kellin: thats my theory yes.
17:38:19 <nirik> it could be a incorrect theory. ;)
17:38:40 <puiterwijk> We could find that out when it happens again though.
17:40:13 <Kellin> could we force a reboot to test it more immediately?
17:42:03 <puiterwijk> During freeze, when we're trying to get an RC? That doesn't sound too smart to do now. I'd suggest trying it after we got an accepted Beta compose, or just if it happens on its own
17:42:03 <nirik> sure, but not sure when a good time for that is...
17:42:18 <nirik> right, I would hate to interfere with rc or nightlys
17:42:40 <mboddu> Kellin: I agree with puiterwijk and nirik here, we have to wait until freeze is up
17:43:46 <maxamillion> same
17:43:47 <mboddu> Okay, so, is there anything that we can do to speed up the compose process?
17:43:55 <maxamillion> (fwiw
17:43:56 <maxamillion> 0
17:44:00 <maxamillion> )*
17:44:32 <Kellin> not without metrics to determine the best place to attack first; so getting the pungi measuring code in is step 0
17:46:04 <threebean> sorry I'm late.  has anyone brought up the baby compose idea yet?
17:46:06 <maxamillion> +1
17:46:26 <Kellin> threebean: is that the "run x86_64" alone then the rest idea?
17:46:30 <threebean> baby compose idea -> add a second nightly compose that runs at the same time as the current nightly.
17:46:40 <threebean> yeah - limit the baby compose to only x86_64 and disable all spins.
17:46:54 <threebean> it'll complete more quickly, and give qa something to do preliminary testing on.
17:47:09 <Kellin> so to that point, I have to clarify one thing
17:47:11 <threebean> this way they don't find issues "at the end of the work day" after a 12 hour compose, when devs are already checking out for the day.
17:47:13 <maxamillion> I like it, is this something that QA sees value in?
17:47:21 <threebean> haven't mentioned it to them yet.
17:47:24 <threebean> maxamillion: ^^
17:47:27 <maxamillion> rgr
17:47:35 <maxamillion> I'm +1
17:47:38 <Kellin> we do a nightly compose; what is stopping QA from using the freeze + installing updated packages on their own?
17:48:16 <Kellin> if what we're testing is "does this new version of a package work", isn't that just as testable using the freeze candidate + updated packages as it would be spinning a whole new compose?
17:48:17 <nirik> Im not sure that would help too much...
17:48:32 <maxamillion> Kellin: upgrades vs freshly composed images have proven to not be equal in the past and can at time produce different bugs (rpm transaction causing side affect)
17:48:33 <nirik> Kellin: if you don't need media or a tree/repo, sure
17:48:35 <puiterwijk> Kellin: the fact that updated packages would result in a different system. Like, "does it boot directly after install"
17:49:23 <puiterwijk> Basically, what Adam and Kevin said
17:49:50 <nirik> so this x86_64 only compose would just stay in a compose area? not be synced out to mirrors, etc?
17:49:56 * threebean nods
17:50:05 <mboddu> nirik: yes
17:50:15 <threebean> just a branched nightly that we throwaway after ~48 hours.  it's only purpose is to get a jump on testing the real nightly.
17:50:15 <Kellin> what is the lifetime durability of it?
17:50:24 <Kellin> ^^question answered
17:50:29 * nirik would prefer to see if we can find low hanging fruit in the main compose... more composes just means more stuff to deal with... I don't know that it would help.
17:50:44 <threebean> fair, yeah.
17:50:48 <nirik> but might be worth asking qa...
17:51:07 <mboddu> nirik: I am trying to find a sweet spot here for releng, infra and qa
17:51:14 <nirik> also, adamw wanted targeted composes for new packages in a small set. (kinda a smaller idea of this)
17:51:24 <Kellin> mboddu: there's not really a sweet spot; it's just a huge ask with limited resources to actually do the work
17:51:39 <nirik> ie, new anaconda, kernel, lorax, etc land, weimmediately do a quick compose to see if they are viable
17:51:46 <mboddu> nirik: yep, which is still on my plate but I am planning to use loopabull for it
17:51:47 <Kellin> at best here - we're trying to find the least painful way to let QA test a little earlier
17:52:18 <Kellin> mboddu: is that tied to the meetings we had a month or two ago?
17:52:23 <maxamillion> nirik: +1
17:52:29 <mboddu> Kellin: yes, it is
17:52:33 <maxamillion> threebean: isn't that was ODCS is meant to do ^^^^
17:52:36 <maxamillion> ?8
17:52:40 <maxamillion> ?**
17:55:27 <threebean> in a broad sense, that's what ODCS is meant to do.
17:55:40 <maxamillion> rgr
17:55:44 <threebean> in a narrow sense, no.  we tooled it for now to only really support doing module composes.
17:55:52 <maxamillion> ah ok
17:56:02 <threebean> would be easy to expand it to handle this.  but it's not ready out of the box (and we're still in security audit with puiterwijk).
17:56:14 <maxamillion> cool
17:56:21 <maxamillion> puiterwijk++ for security audit
18:00:10 <maxamillion> alright, so anything else on the topic?
18:00:20 <maxamillion> anything actionable in the near term or do we need to wait for ODCS and extend it?
18:00:22 <mboddu> Okay, so currently the temporary solution is threebean idea of baby composes
18:00:23 <mboddu> ?
18:01:17 <maxamillion> which would get QA something more quickly than it is currently
18:01:24 <mboddu> maxamillion: yes
18:01:30 <maxamillion> than the situation is currently*
18:01:41 <nirik> well, I want to look at the profile info. ;)
18:01:48 <maxamillion> I would maybe propose the baby compose as a temporary solution until ODCS can take that over
18:01:51 <nirik> and perhaps we will see things we can change there.
18:01:55 <maxamillion> nirik: +1 profile info
18:01:58 <threebean> +1
18:02:34 <mboddu> #info Once the new pungi is released we will gather info using profiling, based on that we will look into speeding up the composes
18:02:34 <nirik> also, there may be some things we can do on the builder side... we need to get arm composes working on buildvm-armv7's
18:02:47 <nirik> currently they are running on the old armv7 hw, which is slower.
18:03:25 <nirik> and we might also look at whats in koji compose channels... and make more/reassign builders so composes aren't waiting on having builders.
18:04:12 <Kellin> maxamillion: expand OCDS acrynym please
18:04:29 <Kellin> maxamillion: err, ODCS
18:05:01 <mboddu> #info infra will take a look into armv7 builders which are running on old armv7 hardware.
18:05:23 <mboddu> Anything else?
18:05:25 <puiterwijk> Kellin: On Demand Compose Service
18:06:47 <maxamillion> Kellin: sorry
18:06:50 <maxamillion> distracted
18:06:52 <maxamillion> Kellin: what puiterwijk said
18:07:04 <mboddu> Okay, thanks for joining guys
18:07:09 <mboddu> #endmeeting