16:03:26 <mboddu> #startmeeting RELENG (2019-04-11)
16:03:27 <zodbot> Meeting started Wed Apr 10 16:03:26 2019 UTC.
16:03:27 <zodbot> This meeting is logged and archived in a public location.
16:03:27 <zodbot> The chair is mboddu. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:03:27 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:03:27 <zodbot> The meeting name has been set to 'releng_(2019-04-11)'
16:03:27 <mboddu> #meetingname releng
16:03:27 <zodbot> The meeting name has been set to 'releng'
16:03:27 <mboddu> #chair nirik tyll sharkcz masta pbrobinson pingou puiterwijk maxamillion mboddu dustymabe ksinny jednorozec
16:03:27 <zodbot> Current chairs: dustymabe jednorozec ksinny masta maxamillion mboddu nirik pbrobinson pingou puiterwijk sharkcz tyll
16:03:27 <mboddu> #topic init process
16:03:43 <nirik> morning
16:04:07 <mboddu> nirik: Morning, how goes?
16:04:17 <nirik> busy busy.
16:05:02 <sharkcz> hi, /me is somewhat here
16:05:24 <mboddu> nirik: Okay, I want to get us started as soon as possible, I have about 5 things
16:05:29 * mboddu waves at sharkcz
16:05:50 <mboddu> #topic #8258 Add ignatenkobrain to cvsadmin
16:05:57 <mboddu> #link https://pagure.io/releng/issue/8258
16:06:14 <mboddu> nirik: Can you do this as I dont have permissions to do so?
16:06:50 <nirik> yes, but should we just do it? in the past we asked current members to vote... but I guess there's not too many active people anymore.
16:07:24 <mboddu> nirik: Okay, I didn't know about voting, so who all has to vote?
16:07:56 <nirik> well, it was everyone in the group... but I guess things have changed. A lot of those people aren't active in releng much, and it means different things than it used to.
16:08:11 <nirik> so I guess I'd be ok to just do it if we are ok with it.
16:08:23 <sharkcz> no objections from me
16:08:34 <mboddu> nirik: I am okay with it
16:08:51 * nirik will go do it.
16:09:02 <mboddu> #info nirik will add ignatenkobrain to cvsadmin group. We got +3 votes for it.
16:09:06 <mboddu> Just for the record :)
16:10:03 <mboddu> #topic #8270 Container Base Image Release process improvement
16:10:09 <mboddu> https://pagure.io/releng/issue/8270
16:10:11 <mboddu> #link https://pagure.io/releng/issue/8270
16:10:19 <mboddu> cverna: Are you around?
16:10:47 <mboddu> I am +1 with the idea, but I dont want everyone doing this at anytime they want.
16:11:12 <nirik> I'm happy for more automation...
16:11:25 <nirik> but yeah, I don't want to make it manual for more people...
16:11:29 <mboddu> We can start with RelEng folks and you and any other community people in container SIG that you want to make the release
16:11:40 <mboddu> But not all
16:11:51 <nirik> well, can we just try and automate it all?
16:12:16 <mboddu> nirik: Yes, definitely automation, but someone has to call it
16:12:25 <mboddu> It should be running a script
16:12:35 <nirik> why? why not just everyday?
16:12:48 <nirik> (at least to ours)
16:13:01 <mboddu> If no one is testing things, then sure, daily or bi-weekly
16:13:54 * mboddu thought container sig tests those containers and asks releng to release them
16:14:26 <mboddu> If thats not the case, then we can just cron it up
16:16:39 <otaylor> it's possible it should be cron'ed daily, but only pushed publicly (quay.io, registry.fp.o, dockerhub) on a smaller schedule?
16:16:59 <nirik> yeah, could be...
16:17:36 <otaylor> I think it comes down to who/how is going to notice breakage (is there automated mail? a web page with status?) - and what testing is done
16:17:40 <mboddu> otaylor: what would the "cron'ed daily" will do?
16:18:53 <otaylor> mboddu: go through all the steps but the public push - so if something happens that breaks the image creation, it can be fixed asap before the next push out publicly
16:18:56 <mboddu> otaylor: Good point on "notice breakage", probably we can start with sending an email to rel-eng@ list when thigns break
16:19:14 <otaylor> mboddu: It would basically be "CI" - it may or may not be useful - just putting it out there
16:20:01 <mboddu> otaylor: We already create the images everyday, the ticket is just about pushing those images publicly
16:20:24 <otaylor> mboddu: OK, I scanned through the ticket, should have read more carefully :-)
16:20:43 <mboddu> otaylor: np :)
16:21:23 <mboddu> otaylor: But it doesnt mean that we are getting those images daily, for ex: 28, 29 containers have been failing for few days now
16:21:44 <mboddu> Due to a bug and Peter is going to look into it
16:22:00 <mboddu> Once he is back from his vacation
16:22:44 <mboddu> Here's what I think:
16:23:51 <mboddu> Automate the entire pushing process and make a cron job to push to different registries at regular intervals (probably 2 weeks) and send an email to rel-eng@ list if the push fails
16:24:04 <mboddu> nirik: ^ what do you think?
16:24:32 <mboddu> Also, its only possible if no one is testing the nightly images, if someone is testing them, then we probably should eliminate the cron job
16:24:37 <mboddu> I will talk to cverna about it
16:25:06 * cverna around
16:25:25 <nirik> yeah, depends on whats desired I guess.
16:25:29 <sharkcz> mboddu: those doing the testing could set a mark somewhere (greenwave?), so it still could be cron-ed
16:25:59 <mboddu> sharkcz: Sure, thats possible
16:26:06 <sharkcz> I would say "no new process if it's not automated" :-)
16:26:23 <mboddu> cverna: So, when you create those requests to release the base containers, is someone testing them before making the request?
16:26:27 <cverna> if it is just pushing a button why we would not want to allow member of the container SIG to do it ?
16:26:43 <cverna> we do not have CI currently :(
16:27:05 <nirik> well, avoiding anyone pushing a button is good IMHO.
16:27:10 <cverna> see https://pagure.io/fedora-ci/general/issue/47
16:27:22 <mboddu> cverna: Then why cant we just use a cron job to push them?
16:27:22 <nirik> in case they forget, or they press it at the wrong time, or...
16:27:32 <cverna> Docker Hub provides us with good CI
16:27:37 * cverna will look for a PR
16:28:21 <cverna> https://github.com/docker-library/official-images/pull/5673
16:28:47 <cverna> currently the process is based on me filling a ticket to  releng
16:29:09 <cverna> so anyone could really do it to be honest
16:29:30 <mboddu> cverna: Yes, cant we just use a cron job to push the images? Rather than someone (incl me) just running the scripts?
16:29:31 <cverna> then I think mohan just runs a scripts
16:30:22 <cverna> cronjob would be good but we need the release pipeline to fail early if the compose is broken (ie missing arch)
16:30:33 <cverna> otherwise we will just publish the same thing
16:31:09 <nirik> sure, it would need to check and fail if no new ones
16:31:24 <mboddu> cverna: We can add a check for that, if today's push == the same on the registry then skip
16:32:01 <cverna> anyway I think the main point is to try to get the 2 process to be common
16:32:11 <cverna> then we can decide who triggers it :)
16:33:25 <mboddu> I think it would be great if we just do the cron, but I am definitely up for the unifying the script
16:33:28 <nirik> sure, we just prefer close to 0 as we can get. ;)
16:33:34 <cverna> +1
16:34:27 <mboddu> #info We are going to unify the process and *probably* make a cron job to push the images to different registries
16:34:31 <mboddu> cverna: ^ is that okay?
16:34:32 <cverna> I ll work on unifying the script and we can try to setup a cron job
16:34:37 <cverna> works for me
16:34:42 <mboddu> cverna: +1
16:35:59 <mboddu> #topic Compose Failures
16:36:15 <mboddu> nirik: ^ Your favorite topic :P
16:36:27 <nirik> you betcha
16:36:36 * nirik didn't look today
16:36:40 <mboddu> I wanted to bring this up because of the random compose failures while building images
16:36:58 <mboddu> nirik: rawhide same issue with unable to access a package
16:37:23 <cverna> did someone say rawhide gating ?
16:37:26 <cverna> :P
16:37:31 <mboddu> branched failed due to zchunk, which I am not entirely sure if its zchunk or some access issues
16:37:43 <nirik> so, on that...
16:37:44 <mboddu> cverna: rawhide gating wouldn't have solved this problem
16:38:01 <nirik> we updated rawhide composer at least... and I think we got the newer createrepo_c...
16:38:14 <cverna> mboddu: ah :P
16:38:15 <nirik> I cant recall the problem on that, perhaps we need to downgrade again?
16:38:59 <nirik> downgrading things and keeping them old is... very bad for our setup. It becomes hard to recall why, or it just gets excluded until the end of time and we never get new ones until we notice
16:39:09 <mboddu> nirik: But rawhide compose went past that and both of them are using the same createrepo_c
16:39:34 <nirik> so when you say "branched failed due to zchunk" what do you mean?
16:40:16 <mboddu> DEBUG util.py:439:  error: Curl error (56): Failure when receiving data from the peer for https://kojipkgs.fedoraproject.org/compose/branched/Fedora-30-20190410.n.0/compose/Everything/armhfp/os/repodata/452f16fddf4d39da87a2db15ad4c3774ff80ebf2fe99e04934ed829009b93d83-primary.xml.zck [] (https://kojipkgs.fedoraproject.org/compose/branched/Fedora-30-20190410.n.0/compose/Everything/armhfp/os/repodata/452f16fddf4d39da87a2db15ad4c3774ff8
16:40:16 <mboddu> 0ebf2fe99e04934ed829009b93d83-primary.xml.zck).
16:40:16 <mboddu> DEBUG util.py:439:  koji-override-0                                 5.7 MB/s |  36 MB     00:06
16:40:19 <mboddu> DEBUG util.py:439:  Cannot download 'https://kojipkgs.fedoraproject.org/compose/branched/Fedora-30-20190410.n.0/compose/Everything/armhfp/os': Yum repo downloading error: Downloading error(s): repodata/452f16fddf4d39da87a2db15ad4c3774ff80ebf2fe99e04934ed829009b93d83-primary.xml.zck - Cannot download, all mirrors were already tried without success.
16:41:03 <nirik> ok...
16:41:10 <mboddu> ^ I am not sure if its really zchunk problem or the random cannot download issues we are hitting lately
16:41:35 <nirik> that looks like the zchunk thing, but I thought it was fixed.
16:41:46 <mboddu> While rawhide went past that and failed at:
16:42:03 <mboddu> DEBUG util.py:439:  Unable to create appliance : Unable to download from repo : Cannot download Packages/d/dhcp-common-4.4.1-11.fc31.noarch.rpm: All mirrors were tried
16:42:29 <nirik> yeah.
16:42:43 <nirik> I really wish I knew what was going on there or how to prevent it.
16:43:05 <mboddu> nirik: Yeah, yesterday you said you have some idea on how to debug it
16:43:11 <mboddu> Any chance you remember how?
16:43:32 <mboddu> Its really annoying since the scratch image builds are just working fine
16:43:46 <nirik> well, I had an idea, but it's not going to help I don't  think.
16:44:02 <nirik> one thing we did for s390x was to add multiple topurls for koji.
16:44:16 <nirik> then it thought they were multiple places to get things and would try them all.
16:44:27 <nirik> but I don't think this works for image builds... only packages.
16:44:49 <nirik> for image builds it passes in one url
16:44:58 <mboddu> nirik: Hmmm, do you think we should check with curl guys?
16:45:10 <mboddu> I dont know even if they can help
16:45:11 <nirik> despite the fact that kojipkgs.fedoraproject.org there is actually 2 machines.
16:45:17 <mboddu> Since there is no way to reproduce it
16:46:18 <nirik> well... perhaps we could get pungi to pass something to get more debugging from those livemedia tasks?
16:47:04 <nirik> they are also always cleaned up by the time I look
16:47:07 <nirik> which is anoying
16:47:22 <nirik> Perhaps we could bump the time for kojid to clean up failed builds?
16:48:11 <mboddu> nirik: +1 to both
16:48:46 <nirik> ok, for the first I guess we should talk to lseldar... and/or open a ticket?
16:49:11 <mboddu> I can do it
16:49:23 <mboddu> Can you make the kojid changes?
16:49:54 <nirik> I can look into it yeah...
16:50:09 <nirik> I'd say if we can bump it to a day that might be good.
16:50:34 <mboddu> nirik: +1
16:50:56 <mizdebsk> fwiw there is a koji plugin that can prevent deletion of failed tasks that meet certain conditions
16:51:52 <mboddu> #info mboddu will file a ticket against pungi to add more debugging for livemedia tasks and nirik will look at increasing the increasing the time before kojid starts cleaning up
16:52:01 <nirik> mizdebsk: oh? thats interesting... I should really look into plugins more... theres a bunch more I don't know about
16:52:26 <mizdebsk> https://docs.pagure.org/koji/plugins/#save-failed-tree-plugin
16:52:43 <mboddu> mizdebsk: +1
16:53:07 <nirik> also... huh. I see log_timestamps seems interesting.
16:53:58 <nirik> looks like failed buildroot lifetime is hard coded to 4 hours
16:54:18 <nirik> oh, no, thats just default
16:54:48 <nirik> ok
16:55:44 <nirik> so, I can bump that to 24 hours?? or 12 might be enough?
16:55:56 <mboddu> nirik: +1
16:56:44 <mboddu> #topic Open Floor
16:57:10 <mboddu> #info F30 Final Freeze starts next week 16th of Apr 2019
16:57:28 * mboddu also had one other thing for open floor but forgot
16:58:18 <mizdebsk> today's koji maintenance could be a good opportunity to deploy koji hub policy changes - for permission to manipulate package lists
16:59:09 <mboddu> mizdebsk: Yes, thats the one, thanks
16:59:21 <nirik> +1
16:59:47 <nirik> mizdebsk: can you push a commit for that, or were we still needing to decide anything on it?
17:00:08 <nirik> also, I liked your suggestion of not doing srpms on s390x builders....
17:00:51 <mizdebsk> policy for package lists is already in ansible.git, it only needs to be enabled in production
17:01:06 <nirik> after f28 goes eol we will also need to make all epel6/7 builds go to specific rhel7 builders...but we can do that when f28 eol hits.
17:01:09 <mboddu> By removing "if env == staging"
17:01:14 <nirik> ah ha. great.
17:01:15 <mizdebsk> mboddu, yes, exactly
17:02:05 <mboddu> mizdebsk: Just make sure we are not doing something thats not supposed to for prod when you remove it
17:02:25 * nirik had one other thing, but is trying to remember it now
17:02:28 <mboddu> If thats the case, then just add package lists policy for prod with exactly what we want in prod
17:02:30 <ksinny> thanks mboddu mizdebsk
17:03:40 <mboddu> nirik: Haha, probably mizdebsk can help, he is also able read minds :P
17:04:30 <mizdebsk> sorry, not this time :)
17:04:56 <nirik> I though mizdebsk was also asking for cvsadmin/koji admin somewhere recently, but I can't find it. ;) I was going to +1 that.
17:05:09 <mboddu> nirik: I am +1 with that
17:06:17 <mizdebsk> i did not ask for that, but if you think it would be useful then i'm happy to help by becoming koji admin
17:06:33 <mboddu> Okay, we went past 5 min than the scheduled time
17:06:57 <mboddu> Anymore discussion, please take it to #fedora-releng channel
17:07:01 <mboddu> Thanks everyone for joining
17:07:08 <mboddu> #endmeeting