17:01:26 #startmeeting RELENG (2018-02-15) 17:01:26 Meeting started Thu Feb 15 17:01:26 2018 UTC. The chair is Kellin. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:26 Useful Commands: #action #agreed #halp #info #idea #link #topic. 17:01:26 The meeting name has been set to 'releng_(2018-02-15)' 17:01:26 #meetingname releng 17:01:26 The meeting name has been set to 'releng' 17:01:26 #chair dgilmore nirik tyll sharkcz masta pbrobinson pingou puiterwijk maxamillion mboddu Kellin 17:01:26 Current chairs: Kellin dgilmore masta maxamillion mboddu nirik pbrobinson pingou puiterwijk sharkcz tyll 17:01:38 morning 17:01:43 #topic RawHide Issues 17:03:24 .hello2 17:03:25 dustymabe: dustymabe 'Dusty Mabe' 17:03:46 lots of heat coming on with regard to Rawhide - it seems to me like the issues are being addressed, could we info share a bit to get us on the same page? 17:04:04 Kellin +1 17:04:09 a sync is good 17:04:40 a couple of the buildinstalls have succeeded so far 17:04:45 for today's rawhide run 17:04:57 so that means we are past the dep issues (thanks kevin) 17:04:59 I am trying to monitor the run; the files don't change mod times when updates come in (guessing it keeps the filehandle open) 17:05:19 just waiting for todays to see what we hit. 17:05:21 it's been running for 11 hours ish - any idea on how long a compose is running these days normally? 17:05:24 Kellin: you could tail -f ?? or refresh in the browser 17:05:36 it was stalled on arm composes (as often is the case) 17:05:52 so the time is the time it takes + however many hours it was stalled 17:06:05 nirik: the arm stuff is a real thorn 17:06:06 nirik: do we know why the arm composes stall? 17:06:26 https://bugzilla.redhat.com/show_bug.cgi?id=1504264 17:06:39 I have some things to try, but keep not having time to get to them 17:07:18 those things: 17:07:37 trying to reinstall a host with f27 instead of rhel7.4 and see if that helps. 17:07:52 trying to install the buildvm-armv7 vm on iscsi or other storage 17:08:23 how many arm machines do we have? 17:08:28 armv7 17:08:28 nirik: so a kernel issue? 17:09:08 seems like 3 according to the ticket dustymabe 17:09:11 There are 25 buildvm-armv7's... but only 3 are in the compose channel 17:09:30 seems like a low ration 17:09:32 could be a kernel issue sure 17:09:32 ratio* 17:09:42 what are the other 22 being used for? 17:09:43 not really no. 17:09:48 building packages? 17:09:54 I see 17:10:08 the compose ones just run compose jobs... once a day for a hour or two 17:10:16 and otherwise are idle 17:10:41 how much do they cost? could I just buy like 2 or 3 more and add them? 17:11:05 they are vm's... running on aarch64 moonshot cartriges. 17:11:24 adding more isn't going to help... if they hang, we still need a human to deal with that 17:11:27 nirik: is it worthwhile to try talking to Laura to see if we can get a kernel team member to take a dive into it? 17:11:51 nirik: i think puiterwijk had found out some issues on why machines were filling up 17:11:55 I've asked the arm folks to look... they were not able to reproduce it. 17:11:58 but I can ask again 17:12:00 he opened a PR upstream to oz 17:12:17 this is not filling up. it's hanging. ;) I don't think it's the same issue. 17:12:17 so this is a full kernel lockup? 17:12:17 we have a reproducer - maybe see if there's a debuginfo we can turn on to give them ? 17:12:32 k, sorry 17:12:44 debuginfo on what? how? it's hung... 17:13:47 * Kellin had a kernel team member help him troubleshoot something once, they turned on an option or two so they could see issues...just thought it'd help if they could see the lead up 17:14:16 I'm happy to entertain more ideas on how to track it down... but I think I tried all the obvious ones. ;) 17:14:40 nirik: do you get a stacktrace? 17:14:49 or any logs 17:15:07 no. see bug 17:16:39 nirik: thanks 17:16:54 hmm 17:17:06 so why use the LPAE extension if it's failing? 17:17:13 (assuming I read this correctly) 17:17:42 i also am guessing strace didn't show anything on those procs. 17:17:45 without lpae the vm's see only ~4gb memory, and take about 5x longer to do things. 17:18:03 strace hangs when it hits the D (io wait) processes 17:18:14 yep. was afraid of that 17:18:59 afaik it would need a kdump to get a memory dump and kernel people could analyze it 17:19:49 except I don't know if kdump works on armv7 or in fedora (it's not very used/tested) 17:20:26 true ... 17:20:32 anyhow, I don't think we are going to solve it here today. ;) 17:20:53 so setting aside arm for a second 17:21:17 were there any other issues between the fourth of February and today other than Lorax and libevent? 17:21:26 yes, lots. 17:21:34 pykickstart changed and broke anaconda 17:21:52 something broke appliance-creator 17:22:08 probibly other stuff I have forgotten. 17:22:22 oh, anaconda was broken by a python-meh update 17:23:12 ok - thanks for the info 17:23:30 I suppose we could start keeping track better... 17:23:36 nirik: Kellin +++++! 17:23:38 just to see if there's any trends 17:23:39 1 17:23:54 but it's usually just updating stuff that breaks other stuff 17:23:56 I would love to have a space for each compose where we talk about what the problems are 17:23:58 nirik: I have proposed something to do that 17:24:06 Kellin: oh? 17:24:08 nirik: it's a matter of getting it prioritized 17:24:30 well, we could just make a text file in git somewhere and add to it... I don't think anything fancy is needed. 17:24:32 yeah - basically a console/dashboard for releng work. where we can see things like how often rawhide has failed, reasons for the failure, trends 17:24:48 that sounds like it will take a while to make. 17:25:05 Yeah - but we need something to manage the input streams 17:25:08 right 17:25:10 there's a lot of things to manually watch 17:25:12 so here is what I am thinking 17:25:21 randy just added the composes tab to bodhi 17:25:24 https://bodhi.stg.fedoraproject.org/composes/ 17:25:26 getting it into a dashboard would make releng more efficient 17:25:27 nirik: (sorry for late reply, was AFK), did you have any luck with any of the suggestions I had with the v7 issues? 17:25:38 pbrobinson: I have had 0 time units to try any of them 17:25:43 which will give us a page to see pungi composes that were started by bodhi 17:26:00 but I do want to. ;) 17:26:00 but it would be even better if it linked to a dashboard of pungi composes 17:26:05 and that dashboard had a space for comments 17:26:10 for each compose 17:26:22 dustymabe: I don't htink that belongs in bodhi persay - but that's a larger discussion 17:26:40 Kellin: right. I told him it would be better if we linked to a dashboard that was specifically for pungi composes 17:26:45 but you get the idea 17:27:11 it would make it so 5 different people don't investigate why the compose was broken 17:27:25 dustymabe: Yeah. it's on the roadmap - I think we can table this topic and move to next? 17:27:32 +1 17:28:01 in the mean time I might make a gobby document or something. ;) 17:28:25 #info RawHide failures attributable to Lorax, pykickstart changes breaking anaconda, broken appliance-creator, python-meh update broke anaconda, dependencies on libevent package 17:28:34 nirik: a real short term hack would be to create a new pagure repo that automatically gets a new issue created when a compose fails 17:28:42 the discussion can flow from there 17:28:59 automatically how? 17:29:07 fedmsg + pagure API 17:29:13 I guess. 17:29:26 * dustymabe will stop talking about this now 17:29:51 yeah, there's tons of ways... but yeah, moving on... 17:30:22 #topic need method for distributing urgent fixes... urgently 17:30:29 #link https://pagure.io/releng/issue/5886 17:31:20 I don't expect us to solve it - but what do we need to do for motion on this? 17:31:39 something like this, I almost think we should ask for funding for a FAD 17:31:55 fly everyone to one place, sit, figure out the plan, and then go home and implemnet 17:31:59 IMHO we don't need it anymore. Pushes are fast enough 17:32:20 were pushes a lot slower when this got filed? 17:32:58 yeah, sometimes much longer back in mash days when it was doing things one at a time. 17:33:16 all streams pushed today in 3hours. updates and updates-testing. 17:33:20 ehh. I think there should be a FAD in speeding up composes 17:33:23 but that's me 17:33:46 of course, what I'm talking about is more about pungi than bodhi 17:33:47 nirik: there were very few updates today 17:33:55 speeding up would be nice always... 17:34:16 number of updates doesn't really matter since pungi gathers everything and makes the repos each time 17:34:35 we have open taiga epics/cards for pungi speed 17:34:55 there's some ideas around making drpms faster... 17:35:09 everyone okay if I poke this tagging mattdm and see if he wants us to fold this into that issue? 17:35:17 which team would work on the drpm speed issue? 17:35:40 it's a createrepo_c thing... so that upstream I guess? 17:36:59 so would it be fair to A) ping mattdm asking if this can be rolled in with the pungi speed issue, B) open up an issue for drpm speed, and C) suggest we close this issue once the relevant taiga issues are created? 17:37:34 sure. 17:40:31 +1 17:41:45 #info Group agrees this is a three-part issue. Need to sync with mattdm. 17:42:05 #action Kellin to tag mattdm on pagure issue to get more forward motion on this issue. 17:42:41 #topic Refine cleaning up packages with broken deps 17:42:50 #link https://pagure.io/releng/issue/6877 17:43:08 so did we come up with anything for this? 17:43:12 or did this die on devel list 17:47:29 Kellin: looks like no comment? 17:47:43 are we close to being done with tickets? 17:48:58 * nirik hasn't looked or thought about that one for a while, not sure what to do 17:50:55 * dustymabe has something for open floor 17:51:08 ok, no comment I guess 17:51:18 #info No new information to contribute 17:51:22 #topic Open Floor 17:52:22 dustymabe: you're up 17:53:56 Kellin: i'm just wondering when we're going to ship new pungi versions on our composer machines 17:54:04 i.e. is there any plan to 17:54:15 there have been two releases since the current version we are using 17:54:19 dustymabe: IIRC, that's being discussed over on a pagure issue 17:54:23 yeah... 17:54:27 link? 17:54:31 IMHO we need to get a working compose first 17:54:37 sorry I missed it 17:54:39 dustymabe: https://pagure.io/releng/issue/7227 17:54:47 then we need some pungi-fedora config changes before the new one will do any good. 17:54:53 but yes, we should definitely update. 17:55:01 nirik: ok 17:55:10 do we know any plans for the branching strategy? 17:55:25 you mean for modular, for f28 or? 17:55:33 IMHO it would be great if we got a successful compose and also put in any pungi updates before we branch 17:55:43 for f28 17:56:10 yeah, agreed 17:57:25 cool 17:57:40 i'm really going to be on board in the next week trying to get things in order 17:58:51 so... 17:59:02 call me crazy, but we branch every release, why we are trying to re-invent the strategy? 17:59:09 is there something that changes each time we do it? 17:59:59 we're not really re-inventing the strategy as much as trying to minimize work 18:00:13 so if 10 things are broken today and we branch today 18:00:18 well, this time there might be a proposal to change how we do it so it's much easier. 18:00:26 then we need to fix 10 broken things in 2 different places for each of them 18:00:30 yeah 18:00:34 there is always that 18:02:12 ok i need to go eat 18:02:27 thanks for running the meeting Kellin, good to see you back and hope you feel better 18:02:40 thanks 18:03:12 thanks Kellin 18:03:39 #info Dustymabe wants to explore strategies for ensuring successful compose and pungi updates prior to branching to minimize work 18:03:45 Thanks for attending all 18:03:49 #endmeeting