16:03:26 #startmeeting RELENG (2019-04-11) 16:03:27 Meeting started Wed Apr 10 16:03:26 2019 UTC. 16:03:27 This meeting is logged and archived in a public location. 16:03:27 The chair is mboddu. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:03:27 Useful Commands: #action #agreed #halp #info #idea #link #topic. 16:03:27 The meeting name has been set to 'releng_(2019-04-11)' 16:03:27 #meetingname releng 16:03:27 The meeting name has been set to 'releng' 16:03:27 #chair nirik tyll sharkcz masta pbrobinson pingou puiterwijk maxamillion mboddu dustymabe ksinny jednorozec 16:03:27 Current chairs: dustymabe jednorozec ksinny masta maxamillion mboddu nirik pbrobinson pingou puiterwijk sharkcz tyll 16:03:27 #topic init process 16:03:43 morning 16:04:07 nirik: Morning, how goes? 16:04:17 busy busy. 16:05:02 hi, /me is somewhat here 16:05:24 nirik: Okay, I want to get us started as soon as possible, I have about 5 things 16:05:29 * mboddu waves at sharkcz 16:05:50 #topic #8258 Add ignatenkobrain to cvsadmin 16:05:57 #link https://pagure.io/releng/issue/8258 16:06:14 nirik: Can you do this as I dont have permissions to do so? 16:06:50 yes, but should we just do it? in the past we asked current members to vote... but I guess there's not too many active people anymore. 16:07:24 nirik: Okay, I didn't know about voting, so who all has to vote? 16:07:56 well, it was everyone in the group... but I guess things have changed. A lot of those people aren't active in releng much, and it means different things than it used to. 16:08:11 so I guess I'd be ok to just do it if we are ok with it. 16:08:23 no objections from me 16:08:34 nirik: I am okay with it 16:08:51 * nirik will go do it. 16:09:02 #info nirik will add ignatenkobrain to cvsadmin group. We got +3 votes for it. 16:09:06 Just for the record :) 16:10:03 #topic #8270 Container Base Image Release process improvement 16:10:09 https://pagure.io/releng/issue/8270 16:10:11 #link https://pagure.io/releng/issue/8270 16:10:19 cverna: Are you around? 16:10:47 I am +1 with the idea, but I dont want everyone doing this at anytime they want. 16:11:12 I'm happy for more automation... 16:11:25 but yeah, I don't want to make it manual for more people... 16:11:29 We can start with RelEng folks and you and any other community people in container SIG that you want to make the release 16:11:40 But not all 16:11:51 well, can we just try and automate it all? 16:12:16 nirik: Yes, definitely automation, but someone has to call it 16:12:25 It should be running a script 16:12:35 why? why not just everyday? 16:12:48 (at least to ours) 16:13:01 If no one is testing things, then sure, daily or bi-weekly 16:13:54 * mboddu thought container sig tests those containers and asks releng to release them 16:14:26 If thats not the case, then we can just cron it up 16:16:39 it's possible it should be cron'ed daily, but only pushed publicly (quay.io, registry.fp.o, dockerhub) on a smaller schedule? 16:16:59 yeah, could be... 16:17:36 I think it comes down to who/how is going to notice breakage (is there automated mail? a web page with status?) - and what testing is done 16:17:40 otaylor: what would the "cron'ed daily" will do? 16:18:53 mboddu: go through all the steps but the public push - so if something happens that breaks the image creation, it can be fixed asap before the next push out publicly 16:18:56 otaylor: Good point on "notice breakage", probably we can start with sending an email to rel-eng@ list when thigns break 16:19:14 mboddu: It would basically be "CI" - it may or may not be useful - just putting it out there 16:20:01 otaylor: We already create the images everyday, the ticket is just about pushing those images publicly 16:20:24 mboddu: OK, I scanned through the ticket, should have read more carefully :-) 16:20:43 otaylor: np :) 16:21:23 otaylor: But it doesnt mean that we are getting those images daily, for ex: 28, 29 containers have been failing for few days now 16:21:44 Due to a bug and Peter is going to look into it 16:22:00 Once he is back from his vacation 16:22:44 Here's what I think: 16:23:51 Automate the entire pushing process and make a cron job to push to different registries at regular intervals (probably 2 weeks) and send an email to rel-eng@ list if the push fails 16:24:04 nirik: ^ what do you think? 16:24:32 Also, its only possible if no one is testing the nightly images, if someone is testing them, then we probably should eliminate the cron job 16:24:37 I will talk to cverna about it 16:25:06 * cverna around 16:25:25 yeah, depends on whats desired I guess. 16:25:29 mboddu: those doing the testing could set a mark somewhere (greenwave?), so it still could be cron-ed 16:25:59 sharkcz: Sure, thats possible 16:26:06 I would say "no new process if it's not automated" :-) 16:26:23 cverna: So, when you create those requests to release the base containers, is someone testing them before making the request? 16:26:27 if it is just pushing a button why we would not want to allow member of the container SIG to do it ? 16:26:43 we do not have CI currently :( 16:27:05 well, avoiding anyone pushing a button is good IMHO. 16:27:10 see https://pagure.io/fedora-ci/general/issue/47 16:27:22 cverna: Then why cant we just use a cron job to push them? 16:27:22 in case they forget, or they press it at the wrong time, or... 16:27:32 Docker Hub provides us with good CI 16:27:37 * cverna will look for a PR 16:28:21 https://github.com/docker-library/official-images/pull/5673 16:28:47 currently the process is based on me filling a ticket to releng 16:29:09 so anyone could really do it to be honest 16:29:30 cverna: Yes, cant we just use a cron job to push the images? Rather than someone (incl me) just running the scripts? 16:29:31 then I think mohan just runs a scripts 16:30:22 cronjob would be good but we need the release pipeline to fail early if the compose is broken (ie missing arch) 16:30:33 otherwise we will just publish the same thing 16:31:09 sure, it would need to check and fail if no new ones 16:31:24 cverna: We can add a check for that, if today's push == the same on the registry then skip 16:32:01 anyway I think the main point is to try to get the 2 process to be common 16:32:11 then we can decide who triggers it :) 16:33:25 I think it would be great if we just do the cron, but I am definitely up for the unifying the script 16:33:28 sure, we just prefer close to 0 as we can get. ;) 16:33:34 +1 16:34:27 #info We are going to unify the process and *probably* make a cron job to push the images to different registries 16:34:31 cverna: ^ is that okay? 16:34:32 I ll work on unifying the script and we can try to setup a cron job 16:34:37 works for me 16:34:42 cverna: +1 16:35:59 #topic Compose Failures 16:36:15 nirik: ^ Your favorite topic :P 16:36:27 you betcha 16:36:36 * nirik didn't look today 16:36:40 I wanted to bring this up because of the random compose failures while building images 16:36:58 nirik: rawhide same issue with unable to access a package 16:37:23 did someone say rawhide gating ? 16:37:26 :P 16:37:31 branched failed due to zchunk, which I am not entirely sure if its zchunk or some access issues 16:37:43 so, on that... 16:37:44 cverna: rawhide gating wouldn't have solved this problem 16:38:01 we updated rawhide composer at least... and I think we got the newer createrepo_c... 16:38:14 mboddu: ah :P 16:38:15 I cant recall the problem on that, perhaps we need to downgrade again? 16:38:59 downgrading things and keeping them old is... very bad for our setup. It becomes hard to recall why, or it just gets excluded until the end of time and we never get new ones until we notice 16:39:09 nirik: But rawhide compose went past that and both of them are using the same createrepo_c 16:39:34 so when you say "branched failed due to zchunk" what do you mean? 16:40:16 DEBUG util.py:439: error: Curl error (56): Failure when receiving data from the peer for https://kojipkgs.fedoraproject.org/compose/branched/Fedora-30-20190410.n.0/compose/Everything/armhfp/os/repodata/452f16fddf4d39da87a2db15ad4c3774ff80ebf2fe99e04934ed829009b93d83-primary.xml.zck [] (https://kojipkgs.fedoraproject.org/compose/branched/Fedora-30-20190410.n.0/compose/Everything/armhfp/os/repodata/452f16fddf4d39da87a2db15ad4c3774ff8 16:40:16 0ebf2fe99e04934ed829009b93d83-primary.xml.zck). 16:40:16 DEBUG util.py:439: koji-override-0 5.7 MB/s | 36 MB 00:06 16:40:19 DEBUG util.py:439: Cannot download 'https://kojipkgs.fedoraproject.org/compose/branched/Fedora-30-20190410.n.0/compose/Everything/armhfp/os': Yum repo downloading error: Downloading error(s): repodata/452f16fddf4d39da87a2db15ad4c3774ff80ebf2fe99e04934ed829009b93d83-primary.xml.zck - Cannot download, all mirrors were already tried without success. 16:41:03 ok... 16:41:10 ^ I am not sure if its really zchunk problem or the random cannot download issues we are hitting lately 16:41:35 that looks like the zchunk thing, but I thought it was fixed. 16:41:46 While rawhide went past that and failed at: 16:42:03 DEBUG util.py:439: Unable to create appliance : Unable to download from repo : Cannot download Packages/d/dhcp-common-4.4.1-11.fc31.noarch.rpm: All mirrors were tried 16:42:29 yeah. 16:42:43 I really wish I knew what was going on there or how to prevent it. 16:43:05 nirik: Yeah, yesterday you said you have some idea on how to debug it 16:43:11 Any chance you remember how? 16:43:32 Its really annoying since the scratch image builds are just working fine 16:43:46 well, I had an idea, but it's not going to help I don't think. 16:44:02 one thing we did for s390x was to add multiple topurls for koji. 16:44:16 then it thought they were multiple places to get things and would try them all. 16:44:27 but I don't think this works for image builds... only packages. 16:44:49 for image builds it passes in one url 16:44:58 nirik: Hmmm, do you think we should check with curl guys? 16:45:10 I dont know even if they can help 16:45:11 despite the fact that kojipkgs.fedoraproject.org there is actually 2 machines. 16:45:17 Since there is no way to reproduce it 16:46:18 well... perhaps we could get pungi to pass something to get more debugging from those livemedia tasks? 16:47:04 they are also always cleaned up by the time I look 16:47:07 which is anoying 16:47:22 Perhaps we could bump the time for kojid to clean up failed builds? 16:48:11 nirik: +1 to both 16:48:46 ok, for the first I guess we should talk to lseldar... and/or open a ticket? 16:49:11 I can do it 16:49:23 Can you make the kojid changes? 16:49:54 I can look into it yeah... 16:50:09 I'd say if we can bump it to a day that might be good. 16:50:34 nirik: +1 16:50:56 fwiw there is a koji plugin that can prevent deletion of failed tasks that meet certain conditions 16:51:52 #info mboddu will file a ticket against pungi to add more debugging for livemedia tasks and nirik will look at increasing the increasing the time before kojid starts cleaning up 16:52:01 mizdebsk: oh? thats interesting... I should really look into plugins more... theres a bunch more I don't know about 16:52:26 https://docs.pagure.org/koji/plugins/#save-failed-tree-plugin 16:52:43 mizdebsk: +1 16:53:07 also... huh. I see log_timestamps seems interesting. 16:53:58 looks like failed buildroot lifetime is hard coded to 4 hours 16:54:18 oh, no, thats just default 16:54:48 ok 16:55:44 so, I can bump that to 24 hours?? or 12 might be enough? 16:55:56 nirik: +1 16:56:44 #topic Open Floor 16:57:10 #info F30 Final Freeze starts next week 16th of Apr 2019 16:57:28 * mboddu also had one other thing for open floor but forgot 16:58:18 today's koji maintenance could be a good opportunity to deploy koji hub policy changes - for permission to manipulate package lists 16:59:09 mizdebsk: Yes, thats the one, thanks 16:59:21 +1 16:59:47 mizdebsk: can you push a commit for that, or were we still needing to decide anything on it? 17:00:08 also, I liked your suggestion of not doing srpms on s390x builders.... 17:00:51 policy for package lists is already in ansible.git, it only needs to be enabled in production 17:01:06 after f28 goes eol we will also need to make all epel6/7 builds go to specific rhel7 builders...but we can do that when f28 eol hits. 17:01:09 By removing "if env == staging" 17:01:14 ah ha. great. 17:01:15 mboddu, yes, exactly 17:02:05 mizdebsk: Just make sure we are not doing something thats not supposed to for prod when you remove it 17:02:25 * nirik had one other thing, but is trying to remember it now 17:02:28 If thats the case, then just add package lists policy for prod with exactly what we want in prod 17:02:30 thanks mboddu mizdebsk 17:03:40 nirik: Haha, probably mizdebsk can help, he is also able read minds :P 17:04:30 sorry, not this time :) 17:04:56 I though mizdebsk was also asking for cvsadmin/koji admin somewhere recently, but I can't find it. ;) I was going to +1 that. 17:05:09 nirik: I am +1 with that 17:06:17 i did not ask for that, but if you think it would be useful then i'm happy to help by becoming koji admin 17:06:33 Okay, we went past 5 min than the scheduled time 17:06:57 Anymore discussion, please take it to #fedora-releng channel 17:07:01 Thanks everyone for joining 17:07:08 #endmeeting