17:01:26 <Kellin> #startmeeting RELENG (2018-02-15)
17:01:26 <zodbot> Meeting started Thu Feb 15 17:01:26 2018 UTC.  The chair is Kellin. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:01:26 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
17:01:26 <zodbot> The meeting name has been set to 'releng_(2018-02-15)'
17:01:26 <Kellin> #meetingname releng
17:01:26 <zodbot> The meeting name has been set to 'releng'
17:01:26 <Kellin> #chair dgilmore nirik tyll sharkcz masta pbrobinson pingou puiterwijk maxamillion mboddu Kellin
17:01:26 <zodbot> Current chairs: Kellin dgilmore masta maxamillion mboddu nirik pbrobinson pingou puiterwijk sharkcz tyll
17:01:38 <nirik> morning
17:01:43 <Kellin> #topic RawHide Issues
17:03:24 <dustymabe> .hello2
17:03:25 <zodbot> dustymabe: dustymabe 'Dusty Mabe' <dustymabe@redhat.com>
17:03:46 <Kellin> lots of heat coming on with regard to Rawhide - it seems to me like the issues are being addressed, could we info share a bit to get us on the same page?
17:04:04 <dustymabe> Kellin +1
17:04:09 <dustymabe> a sync is good
17:04:40 <dustymabe> a couple of the buildinstalls have succeeded so far
17:04:45 <dustymabe> for today's rawhide run
17:04:57 <dustymabe> so that means we are past the dep issues (thanks kevin)
17:04:59 <Kellin> I am trying to monitor the run; the files don't change mod times when updates come in (guessing it keeps the filehandle open)
17:05:19 <nirik> just waiting for todays to see what we hit.
17:05:21 <Kellin> it's been running for 11 hours ish - any idea on how long a compose is running these days normally?
17:05:24 <dustymabe> Kellin: you could tail -f ?? or refresh in the browser
17:05:36 <nirik> it was stalled on arm composes (as often is the case)
17:05:52 <nirik> so the time is the time it takes + however many hours it was stalled
17:06:05 <dustymabe> nirik: the arm stuff is a real thorn
17:06:06 <Kellin> nirik: do we know why the arm composes stall?
17:06:26 <nirik> https://bugzilla.redhat.com/show_bug.cgi?id=1504264
17:06:39 <nirik> I have some things to try, but keep not having time to get to them
17:07:18 <nirik> those things:
17:07:37 <nirik> trying to reinstall a host with f27 instead of rhel7.4 and see if that helps.
17:07:52 <nirik> trying to install the buildvm-armv7 vm on iscsi or other storage
17:08:23 <dustymabe> how many arm machines do we have?
17:08:28 <dustymabe> armv7
17:08:28 <Kellin> nirik: so a kernel issue?
17:09:08 <Kellin> seems like 3 according to the ticket dustymabe
17:09:11 <nirik> There are 25 buildvm-armv7's... but only 3 are in the compose channel
17:09:30 <dustymabe> seems like a low ration
17:09:32 <nirik> could be a kernel issue sure
17:09:32 <dustymabe> ratio*
17:09:42 <dustymabe> what are the other 22 being used for?
17:09:43 <nirik> not really no.
17:09:48 <nirik> building packages?
17:09:54 <dustymabe> I see
17:10:08 <nirik> the compose ones just run compose jobs... once a day for a hour or two
17:10:16 <nirik> and otherwise are idle
17:10:41 <dustymabe> how much do they cost? could I just buy like 2 or 3 more and add them?
17:11:05 <nirik> they are vm's... running on aarch64 moonshot cartriges.
17:11:24 <nirik> adding more isn't going to help... if they hang, we still need a human to deal with that
17:11:27 <Kellin> nirik: is it worthwhile to try talking to Laura to see if we can get a kernel team member to take a dive into it?
17:11:51 <dustymabe> nirik: i think puiterwijk had found out some issues on why machines were filling up
17:11:55 <nirik> I've asked the arm folks to look... they were not able to reproduce it.
17:11:58 <nirik> but I can ask again
17:12:00 <dustymabe> he opened a PR upstream to oz
17:12:17 <nirik> this is not filling up. it's hanging. ;) I don't think it's the same issue.
17:12:17 <dustymabe> so this is a full kernel lockup?
17:12:17 <Kellin> we have a reproducer - maybe see if there's a debuginfo we can turn on to give them ?
17:12:32 <dustymabe> k, sorry
17:12:44 <nirik> debuginfo on what? how? it's hung...
17:13:47 * Kellin had a kernel team member help him troubleshoot something once, they turned on an option or two so they could see issues...just thought it'd help if they could see the lead up
17:14:16 <nirik> I'm happy to entertain more ideas on how to track it down... but I think I tried all the obvious ones. ;)
17:14:40 <dustymabe> nirik: do you get a stacktrace?
17:14:49 <dustymabe> or any logs
17:15:07 <nirik> no. see bug
17:16:39 <dustymabe> nirik: thanks
17:16:54 <Kellin> hmm
17:17:06 <Kellin> so why use the LPAE extension if it's failing?
17:17:13 <Kellin> (assuming I read this correctly)
17:17:42 <dustymabe> i also am guessing strace didn't show anything on those procs.
17:17:45 <nirik> without lpae the vm's see only ~4gb memory, and take about 5x longer to do things.
17:18:03 <nirik> strace hangs when it hits the D (io wait) processes
17:18:14 <dustymabe> yep. was afraid of that
17:18:59 <sharkcz> afaik it would need a kdump to get a memory dump and kernel people could analyze it
17:19:49 <nirik> except I don't know if kdump works on armv7 or in fedora (it's not very used/tested)
17:20:26 <sharkcz> true ...
17:20:32 <nirik> anyhow, I don't think we are going to solve it here today. ;)
17:20:53 <Kellin> so setting aside arm for a second
17:21:17 <Kellin> were there any other issues between the fourth of February and today other than Lorax and libevent?
17:21:26 <nirik> yes, lots.
17:21:34 <nirik> pykickstart changed and broke anaconda
17:21:52 <nirik> something broke appliance-creator
17:22:08 <nirik> probibly other stuff I have forgotten.
17:22:22 <nirik> oh, anaconda was broken by a python-meh update
17:23:12 <Kellin> ok - thanks for the info
17:23:30 <nirik> I suppose we could start keeping track better...
17:23:36 <dustymabe> nirik: Kellin +++++!
17:23:38 <nirik> just to see if there's any trends
17:23:39 <dustymabe> 1
17:23:54 <nirik> but it's usually just updating stuff that breaks other stuff
17:23:56 <dustymabe> I would love to have a space for each compose where we talk about what the problems are
17:23:58 <Kellin> nirik: I have proposed something to do that
17:24:06 <nirik> Kellin: oh?
17:24:08 <Kellin> nirik: it's a matter of getting it prioritized
17:24:30 <nirik> well, we could just make a text file in git somewhere and add to it... I don't think anything fancy is needed.
17:24:32 <Kellin> yeah - basically a console/dashboard for releng work.  where we can see things like how often rawhide has failed, reasons for the failure, trends
17:24:48 <nirik> that sounds like it will take a while to make.
17:25:05 <Kellin> Yeah - but we need something to manage the input streams
17:25:08 <dustymabe> right
17:25:10 <Kellin> there's a lot of things to manually watch
17:25:12 <dustymabe> so here is what I am thinking
17:25:21 <dustymabe> randy just added the composes tab to bodhi
17:25:24 <dustymabe> https://bodhi.stg.fedoraproject.org/composes/
17:25:26 <Kellin> getting it into a dashboard would make releng more efficient
17:25:27 <pbrobinson> nirik: (sorry for late reply, was AFK), did you have any luck with any of the suggestions I had with the v7 issues?
17:25:38 <nirik> pbrobinson: I have had 0 time units to try any of them
17:25:43 <dustymabe> which will give us a page to see pungi composes that were started by bodhi
17:26:00 <nirik> but I do want to. ;)
17:26:00 <dustymabe> but it would be even better if it linked to a dashboard of pungi composes
17:26:05 <dustymabe> and that dashboard had a space for comments
17:26:10 <dustymabe> for each compose
17:26:22 <Kellin> dustymabe: I don't htink that belongs in bodhi persay - but that's a larger discussion
17:26:40 <dustymabe> Kellin: right. I told him it would be better if we linked to a dashboard that was specifically for pungi composes
17:26:45 <dustymabe> but you get the idea
17:27:11 <dustymabe> it would make it so 5 different people don't investigate why the compose was broken
17:27:25 <Kellin> dustymabe: Yeah.  it's on the roadmap - I think we can table this topic and move to next?
17:27:32 <dustymabe> +1
17:28:01 <nirik> in the mean time I might make a gobby document or something. ;)
17:28:25 <Kellin> #info RawHide failures attributable to Lorax, pykickstart changes breaking anaconda, broken appliance-creator, python-meh update broke anaconda, dependencies on libevent package
17:28:34 <dustymabe> nirik: a real short term hack would be to create a new pagure repo that automatically gets a new issue created when a compose fails
17:28:42 <dustymabe> the discussion can flow from there
17:28:59 <nirik> automatically how?
17:29:07 <dustymabe> fedmsg + pagure API
17:29:13 <nirik> I guess.
17:29:26 * dustymabe will stop talking about this now
17:29:51 <nirik> yeah, there's tons of ways... but yeah, moving on...
17:30:22 <Kellin> #topic need method for distributing urgent fixes... urgently
17:30:29 <Kellin> #link https://pagure.io/releng/issue/5886
17:31:20 <Kellin> I don't expect us to solve it - but what do we need to do for motion on this?
17:31:39 <Kellin> something like this, I almost think we should ask for funding for a FAD
17:31:55 <Kellin> fly everyone to one place, sit, figure out the plan, and then go home and implemnet
17:31:59 <nirik> IMHO we don't need it anymore. Pushes are fast enough
17:32:20 <Kellin> were pushes a lot slower when this got filed?
17:32:58 <nirik> yeah, sometimes much longer back in mash days when it was doing things one at a time.
17:33:16 <nirik> all streams pushed today in 3hours. updates and updates-testing.
17:33:20 <dustymabe> ehh. I think there should be a FAD in speeding up composes
17:33:23 <dustymabe> but that's me
17:33:46 <dustymabe> of course, what I'm talking about is more about pungi than bodhi
17:33:47 <Kellin> nirik: there were very few updates today
17:33:55 <nirik> speeding up would be nice always...
17:34:16 <nirik> number of updates doesn't really matter since pungi gathers everything and makes the repos each time
17:34:35 <Kellin> we have open taiga epics/cards for pungi speed
17:34:55 <nirik> there's some ideas around making drpms faster...
17:35:09 <Kellin> everyone okay if I poke this tagging mattdm and see if he wants us to fold this into that issue?
17:35:17 <Kellin> which team would work on the drpm speed issue?
17:35:40 <nirik> it's a createrepo_c thing... so that upstream I guess?
17:36:59 <Kellin> so would it be fair to A) ping mattdm asking if this can be rolled in with the pungi speed issue, B) open up an issue for drpm speed, and C) suggest we close this issue once the relevant taiga issues are created?
17:37:34 <nirik> sure.
17:40:31 <dustymabe> +1
17:41:45 <Kellin> #info Group agrees this is a three-part issue.  Need to sync with mattdm.
17:42:05 <Kellin> #action Kellin to tag mattdm on pagure issue to get more forward motion on this issue.
17:42:41 <Kellin> #topic Refine cleaning up packages with broken deps
17:42:50 <Kellin> #link https://pagure.io/releng/issue/6877
17:43:08 <Kellin> so did we come up with anything for this?
17:43:12 <Kellin> or did this die on devel list
17:47:29 <dustymabe> Kellin: looks like no comment?
17:47:43 <dustymabe> are we close to being done with tickets?
17:48:58 * nirik hasn't looked or thought about that one for a while, not sure what to do
17:50:55 * dustymabe has something for open floor
17:51:08 <Kellin> ok, no comment I guess
17:51:18 <Kellin> #info No new information to contribute
17:51:22 <Kellin> #topic Open Floor
17:52:22 <Kellin> dustymabe: you're up
17:53:56 <dustymabe> Kellin: i'm just wondering when we're going to ship new pungi versions on our composer machines
17:54:04 <dustymabe> i.e. is there any plan to
17:54:15 <dustymabe> there have been two releases since the current version we are using
17:54:19 <Kellin> dustymabe: IIRC, that's being discussed over on a pagure issue
17:54:23 <nirik> yeah...
17:54:27 <dustymabe> link?
17:54:31 <nirik> IMHO we need to get a working compose first
17:54:37 <dustymabe> sorry I missed it
17:54:39 <Kellin> dustymabe: https://pagure.io/releng/issue/7227
17:54:47 <nirik> then we need some pungi-fedora config changes before the new one will do any good.
17:54:53 <nirik> but yes, we should definitely update.
17:55:01 <dustymabe> nirik: ok
17:55:10 <dustymabe> do we know any plans for the branching strategy?
17:55:25 <Kellin> you mean for modular, for f28 or?
17:55:33 <dustymabe> IMHO it would be great if we got a successful compose and also put in any pungi updates before we branch
17:55:43 <dustymabe> for f28
17:56:10 <nirik> yeah, agreed
17:57:25 <dustymabe> cool
17:57:40 <dustymabe> i'm really going to be on board in the next week trying to get things in order
17:58:51 <Kellin> so...
17:59:02 <Kellin> call me crazy, but we branch every release, why we are trying to re-invent the strategy?
17:59:09 <Kellin> is there something that changes each time we do it?
17:59:59 <dustymabe> we're not really re-inventing the strategy as much as trying to minimize work
18:00:13 <dustymabe> so if 10 things are broken today and we branch today
18:00:18 <nirik> well, this time there might be a proposal to change how we do it so it's much easier.
18:00:26 <dustymabe> then we need to fix 10 broken things in 2 different places for each of them
18:00:30 <nirik> yeah
18:00:34 <nirik> there is always that
18:02:12 <dustymabe> ok i need to go eat
18:02:27 <dustymabe> thanks for running the meeting Kellin, good to see you back and hope you feel better
18:02:40 <Kellin> thanks
18:03:12 <nirik> thanks Kellin
18:03:39 <Kellin> #info Dustymabe wants to explore strategies for ensuring successful compose and pungi updates prior to branching to minimize work
18:03:45 <Kellin> Thanks for attending all
18:03:49 <Kellin> #endmeeting