15:32:56 #startmeeting RELENG (2015-06-15) 15:32:56 Meeting started Mon Jun 15 15:32:56 2015 UTC. The chair is dgilmore. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:32:56 Useful Commands: #action #agreed #halp #info #idea #link #topic. 15:33:07 #meetingname releng 15:33:07 The meeting name has been set to 'releng' 15:33:07 #chair dgilmore nirik tyll sharkcz bochecha masta pbrobinson pingou maxamillion 15:33:07 Current chairs: bochecha dgilmore masta maxamillion nirik pbrobinson pingou sharkcz tyll 15:33:10 #topic init process 15:33:14 meeting time all 15:33:20 morning 15:33:26 morning, sorry I'm late 15:33:28 Hi there 15:33:47 * masta waves 15:33:49 howdy all 15:34:24 #topic #6158 Request to discuss Rel-Eng Project Planning Proposal 15:34:37 https://fedorahosted.org/rel-eng/ticket/6158 15:34:45 maxamillion: where are we here? 15:35:15 dgilmore: Fedora Infra team voted to table the Taiga work until Flock 15:35:40 * pbrobinson is here 15:36:10 well, depends on what you mean by table... we should still see if it will otherwise meet needs. 15:36:27 but we didn't want to work on packaging it up yet until we know that we are going to use it? 15:36:42 or figuring how we can deploy it, etc. 15:37:17 maxamillion: is the dev instance back up again? 15:37:18 at least that was my understanding. 15:37:41 I did have a question on the dev instance tho... for maxamillion and threebean 15:37:50 dgilmore: it is, it has a DNS name also http://taiga.cloud.fedoraproject.org 15:38:10 Hello 15:38:14 * threebean is here 15:38:14 maxamillion: awesome, last I knew it was down 15:38:18 * cydrobolt waves 15:38:28 so, that instance is one maxamillion made I think... ad-hock? 15:38:37 dgilmore: yeah it was, threebean did magic to get things back in order 15:38:45 nirik: it is 15:38:48 and threebean also made another one that was a persistent one via ansible. 15:39:01 can we sync that over to the persistant one? 15:39:43 that would make sure we have it ansible and everyone has access to it (right now just maxamillion does) and we would bring it up after reboots, etc. 15:39:59 sure. 15:40:09 to be clear - you're not talking about ansibilizing the taiga configs, right? 15:40:32 correct. Just the base instance... so we have other people's keys on it and our usual setup. 15:40:36 just keeping 1) persistent disk in the cloud for it and 2) keeping the definition of the node around in ansible. 15:40:42 cool, sounds good. 15:41:03 I'll do it. (i think I volunteered for this before already, just haven't gotten around to it. 15:41:05 nirik: +1 15:41:24 use 'taiga.cloud.fedoraproject.org' to access it. it'll get a new ip, but just following the dns entry around now. 15:41:31 cool. that other instance is shut off, but I can power it on and then you change dns to it, run ansible and then manually rsync the stuff and it should be good. ;) 15:41:45 nirik: threebean: maxamillion: cheers for it all 15:42:46 anyone else have anything here 15:43:43 #topic #6164 bodhi2 status update requested 15:43:52 https://fedorahosted.org/rel-eng/ticket/6164 15:44:05 lmacken: where does bodhi2 stand? 15:46:30 https://admin.stg.fedoraproject.org/bodhi2/ is there now, but no data yet 15:46:33 dgilmore: last I heard bodhi2 is up and running in stage (though I'll be honest, I don't entirely know the implications of that statement) 15:47:06 related to this I have another question. :) 15:47:07 yeah I thought bodhi2 was in stage and half working 15:47:52 I was working on ansiblizing releng04/relepel01 (our production bodhi1 masher hosts for updates). 15:48:19 However, if we are going to try and push for bodhi2 in production soon, should I just hold off on that and work on pushing bodhi2 masher stuff? 15:49:07 nirik: I think we need to have it in use before f23, so that we can more easily deliver all the different things needed as part of the changed updates process 15:49:44 ok, so let me not worry about bodhi1 anymore and try and press for bodhi2. ;) 15:49:55 nirik: :) yep 15:49:55 IMHO if we want to land it we need it in production a week before branching. 15:50:20 nirik: well a week or two before change freeze when we enable it 15:50:27 sounds reasonable 15:51:09 sure. ok. 15:51:18 will try and work with lmacken to solve any blockers. 15:51:21 lets move on? 15:51:46 nirik: yep 15:52:04 sorry, was taking care of the dog 15:52:06 not sure this is the right place but please keep us (qa) in the loop on planned changes to bodhi 15:52:10 I need to file a bunch of tickets from the fad to track the last pieces 15:52:21 tflink: we will 15:52:30 bodhi2 ansible playbook written, it's deployed to stg.. need to work on syncing up the prod db and getting the masher runnign next 15:52:35 dgilmore: thanks 15:53:05 lmacken: login seems horribly busted right now 15:53:09 lmacken: please let me know if I can help with ansible or standing up hosts, etc. 15:53:16 dgilmore: yeah, there's an issue with the proxy urls and stuff at the moment 15:53:23 okay 15:53:38 lets move on, and follow up again next week 15:53:42 #topic Secondary Architectures updates 15:53:43 #topic Secondary Architectures update - ppc 15:53:50 pbrobinson: how is ppc? 15:53:50 lmacken: perhaps a mail to releng/infra lists with whats going on? ie, any blockers or plan... 15:53:50 we're looking reasonable here 15:54:08 I've been working to get the P8s into production 15:54:16 awesome 15:54:17 and then review the rest of the Power infra 15:54:31 build wise we're moving forward 15:54:39 if we are to support ppc64le in epel we will need some power8 vms 15:54:43 trying to get close to mainline in prep for mass rebuild 15:54:59 note that copr added ppc64le support. 15:55:03 dgilmore: already got the capacity, need to work with nirik to work out connectivity etc 15:55:16 nirik: yep, I know 15:55:53 pbrobinson: I should talk to you (doesn't have to be in meeting, but either way) about space on secondary arch nfs... we can probibly grow those now and give you a bit more room. 15:56:22 nirik: yes, funny on my todo list to ask about esp for ppc mass rebuild 15:56:51 cool. It will require me figuring out how to grow volumes, but I am sure I can. 15:57:10 nirik: quite straight forward, single command 15:57:35 nirik: I wonder if we can change the secondary arch storage to get benefits of dedupe etc across the secondaries 15:57:36 yeah, all them seem to be single commands, just -with -lots -of -options -to -them 15:58:02 dgilmore: put it in one big volume? sounds like a lot of work... 15:58:04 dgilmore: would need to merge them all into a single export/volume 15:58:11 nirik: perhaps 15:58:24 sadly netapp only dedupe on a single volume 15:58:44 might help us some 15:58:51 not 100% sure how much though 15:59:10 I think it would quite a bit 15:59:26 there is tons of noarch builds 15:59:33 but there's implications and other possible issues too 15:59:49 yeah 16:00:06 we could look at a common location for staging of secondary arches 16:00:22 there are many possible wins, with some risks and cons 16:00:40 you could put all koji instances on a single volume within subdirectories / different exports and get dedupe benefits, but you have other issues etc you need to take into account 16:00:57 currently all secondary arch volumes save about 11-12% via dedupe 16:01:01 pbrobinson: right, that is kinda what I am thinking 16:01:16 primary saves 33% 16:01:22 nirik: but all noarch rpms are common across all secondaries 16:01:26 right 16:01:41 that is potentially a lot of savings 16:01:57 internally across 6 arches I think we get around 40%, it's not just noarch but src.rpm and a bunch of other things like text/graphics in binary rpms too 16:02:16 the netapps dedupe in 4K blocks 16:02:33 something to think on, I don't think we want to do anything now/before f23 16:02:42 nirik: agreed 16:02:57 it would likely also save a bunch on cloud/live/images too 16:03:00 nirik: right, i think wit would be post f23 16:03:29 but something to look at, might be able to get better value out of our disk 16:03:50 #topic Secondary Architectures update - s390 16:03:57 dan is not here 16:04:00 sure, no opposed, just want us to figure out tradeoffs 16:04:01 here's a whitepaper on the de-dupe for those interested https://www.netapp.com/us/system/pdf-reader.aspx?m=tr-3966.pdf 16:04:24 #topic Secondary Architectures update - arm 16:04:29 pbrobinson: how is aarch64? 16:04:39 we're looking pretty good, some cleanups from the perl merge 16:05:07 now we've got the gold linker working it's fixed a few things, notably the ghc mess from F-22 :-D 16:05:31 nice 16:06:02 working with nirik smooge etc about moving some more builders from boston to PHX until we can get decent enterprise kit 16:06:13 cool 16:06:22 * nirik nods 16:06:34 we now also have a process for ARMv7 on aarch64 16:06:39 but it's rough as hell 16:07:00 I would like to get at least one aarch64 host in primary koji so that we can use it to make docker base images for 32 bit arm in primary koji for f23 16:07:20 so I'm working to get that closer to a standard KVM libvirt VM so we can do some testing for things like kernel and docker builds 16:07:52 dgilmore: yes, that's my plan, but it's butt ugly ATM, but I'm hoping to get it better soon 16:08:34 cool 16:08:44 anything else on arm? 16:08:48 nope 16:09:05 #topic FAD followup 16:09:23 I need to file tickets for outstanding deliverables 16:09:40 as well as do a writeup on what we did and achieved 16:09:46 I posted my writeup to my blog/planet.fp.o -> http://pseudogen.blogspot.com/2015/06/fedora-activity-day-release-engineering.html 16:10:00 I need to finish my writeups and post them 16:10:01 but mine was a little more from my perspective 16:10:32 there's a lot of work that happened that I wasn't involved in so I tried to focus on things that I was familiar with and provide links elsewhere for more information 16:11:38 I personally think it was valuable on a number of paths 16:11:57 Yeah, it was a good FAD. 16:12:07 maxamillion: right, there was lots of breakout discussions 16:12:38 do we each need to make a writeup of our FAD contributions? 16:13:09 masta: that's the idea 16:13:11 masta: it would have been a good FAD if we walked away with a working pungi that could be easily iterated on 16:13:19 pungi4* 16:13:32 masta: ideally everyone would do some writeups yes 16:13:59 maxamillion: 2 hours right? :-P 16:14:05 * pbrobinson hides 16:14:13 pbrobinson: yeah, what a horrible failure that was 16:14:45 i believed it to be in better shape than it is 16:14:49 pbrobinson: I did not know how deep that rabbit hole went ... I was under the silly understanding that the code worked before it was thrown over the wall 16:14:59 .... little did I know .... 16:15:09 maxamillion: it's seriously not your fault, I expected you were being a little ambitious from experience..... but you'll never live it down anyway ;-) 16:15:44 pbrobinson: that's fine, I just want to make shit work 16:15:55 I had the same opinion as maxamillion 16:15:58 maxamillion: me too! 16:17:48 anything else people want to mention FAD wise? 16:18:12 other than me bitching aimlessly about pungi4? ... no not really 16:18:20 #topic Open Floor 16:18:25 okay lets move on 16:18:39 so I do now have a s390-koji01 and db... 16:18:47 nirik: cool :) 16:18:48 (in ansible/rhel7) 16:19:02 it needs some more work I think... there's a ticket on it. 16:19:04 how far from moving to it are we? 16:19:12 nirik: groovy, was going to ask about that, also can we configure it with new secondary admin etc groups? 16:19:16 sharkcz wanted to implement some shared shadow koji thing 16:19:20 which I have no idea about. :) 16:19:25 ideally we will move all three secondary arch hubs to it 16:19:28 pbrobinson: already done. ;) 16:19:31 nirik: happy to work with you on that for both access and shadow 16:19:39 * nirik digs up the ticket 16:19:56 https://fedorahosted.org/fedora-infrastructure/ticket/4783 16:19:58 nirik: cool, can you give me details of those so I can review and add other users we'll need for arm/ppc 16:20:19 nirik: cool, will review 16:20:26 pbrobinson: sure thing. They are s390-koji01.qa.fedoraproject.org and db-s390-koji01.qa.fedoraproject.org 16:20:36 it has a db dump from a week or two ago in it. 16:20:53 we need to make sure everything is good, then schedule a migration. 16:20:58 nirik: brilliant, thanks 16:21:03 once we have this one all done the others should be really easy 16:21:11 just different names in templaetes and such 16:21:39 will be a big positive change 16:21:47 yeah. :) 16:21:54 nirik: YAY!!! 16:22:11 pbrobinson: I can spin up arm ones anytime too, but we might make sure we have everything set for s390 first... 16:22:18 but if arm is easier to do first thats fine too 16:22:42 nirik: I guess the hardest bit for ppc will be that the vms are on a ppc box 16:22:48 yeah. 16:22:50 nirik: yes, lets get s390 live and then look at arm etc 16:22:58 ppc may need some tweaking due to that... 16:23:23 dgilmore: nirik: yes, agreed but I have ideas/plans to assist with that I'm working on for PPC in general 16:23:47 if we can get ppc so ansible can talk to the hypervisor and it uses libvirt, then we may be ok. ;) 16:24:09 nirik: that should be doable 16:24:31 nirik: the newer boxes do use libvirt afaik 16:24:38 nirik: yes, that's my plan as I go through the P8 bits, we should be able to get standard KVM/libvirt configs across all arches 16:24:42 yeah 16:24:48 cool 16:25:03 I have also some topics 16:25:30 nirik: pbrobinson: anything else here? or can we move on to tyll's topics? 16:25:38 any progress with mass-rebuild on F23 ? https://fedorahosted.org/rel-eng/ticket/6162 16:25:44 nothing else from me, go ahead 16:25:51 jkurik: please wait 16:25:51 nope, not from me 16:26:00 tyll: what do you have? 16:26:02 * jkurik is inqueue 16:26:02 Is https://fedorahosted.org/rel-eng/ticket/6111 still scheduled to happen before F23? 16:26:21 tyll: I do not think so 16:26:39 we have not yet worked out a new solution for the CA situation 16:28:01 there is some headway made on moving lookaside away from md5 16:28:21 I see, is it then maybe possible to do a flag day for the other three changes and do the client CA change later? 16:28:32 I would rather not 16:28:51 as that would need two flag days 16:29:20 but if it is like one flag day per release it is not like there are a lot of changes that often 16:29:30 that being said, if the CA thing won't be done before F23, maybe it's fine to have one flagday for F23 and another one for F24? 16:29:44 open to talking about it 16:30:11 the last flag day we had for this type of thing was after the incident in 2008 16:30:27 it is not a common thing, and I prefer to keep it that way 16:30:55 what are the changes? 16:31:14 ah, I see. 16:32:43 tyll: do you have anything else? 16:32:44 iirc it will not be as intrusive - for example people using default configs will not have to do that much except requesting a new client certificate as they have to do every 6 months 16:33:22 tyll: the client configs will quite possibly have to change 16:33:35 it really depends on how exactly it is implemented 16:34:34 if we end up using a whole new CA, then it is much more intrusive 16:35:45 even with a new CA we can use two CAs for a migration phase 16:35:48 it doesn't seem to make sense to do it if the final implementation isn't even decided 16:36:10 tyll: maybe. 16:36:42 I quite strongly want to have a single set of changes requiring client side adjustments 16:37:07 I don't know that the first three things need any client side changes. 16:37:33 so cant we just do them and defer the one thats not yet implemented/decided? 16:37:47 if we can work out a way to seamlesly convert people as their certificates expire, then we can role out the other changes at some flag day event 16:38:24 nirik: any CA change in the webserver side does require client changes 16:38:44 nirik: as does the change to sha512 16:38:46 if we do not invalidate the old certificates, we can accept both the old and the new CA on the server side 16:39:07 then people only need to get a new certificate after their old certificate expired 16:39:18 I have to go for 10 minutes to get my rubbish out 16:40:03 perhaps I am not clear what exact steps are to be done for the first 2 changes. 16:40:05 for the well-known certificate for pkgs and koji we need to push new client configs that accept the new well-known CA which might not be the case currently 16:40:16 The sha512 change I thought was all in fedpkg/upload.cgi. 16:40:42 but afaik fedpkg does not check the server certificate in most cases 16:41:58 tyll: could you perhaps add to the ticket more verbose description of what happens for each change and the effect on clients? 16:44:01 nirik: yes, but there are actually several choices, depending on whether we want to require maintainers to do a "flag day" event or have a migration period, I believe I outlined a lot of options in the meeting notes: http://meetbot.fedoraproject.org/fedora-meeting-1/2015-02-23/releng.2015-02-23-16.34.log.html#l-219 16:44:39 so the other topic is the status of fedorahosted trac and pagure for rel-eng 16:45:06 back 16:45:18 tyll: send all changes as pull requests in pagure 16:45:42 I was wondering if tickets will be migrated from trac to pagure as well if they require code changes and if task items like "unblock pkg foo" will be tracked in trac 16:45:47 we are still using trac for tickets etc and there is no plans to change that yet 16:46:21 the only changes for now is to use pagure for code and git 16:47:17 I see, but I guess issues related to code would be better in pagure then as well to be able to easily reference them (given that pagure supports references to them like github does) 16:47:58 sure 16:48:57 anything else? 16:49:03 and what is the rule/workflow for merging pull requests? Can you only merge them? 16:49:04 any news about mass rebuild on F23 ? 16:49:31 or is it like in infra that one needs to give a +1 to a pull request and then anyone might merge it? 16:49:47 tyll: fedpkg checks the server cert when uploading and checking if a file exists 16:49:49 tyll: at the moment pingou and I can. we need to setup a full review process and open it up a bit more 16:49:53 tyll: it downloads over http, though 16:50:30 dgilmore: ok, thx 16:50:40 jkurik: apparently it is supposed to happen tomorrow 16:51:00 thats what we had on the schedule yeah 16:51:11 dgilmore: ok, thanks, just wanted to be sure releng knows of it :-) 16:51:12 jkurik: we have always started them on Friday's in the past, but when it got added to the schedule it was put on Tuesday 16:51:14 tyll: also, the way the code currently works, it will use the .fedora-server-ca.cert to validate the server cert 16:51:39 tyll: and if the servert cert is not signed by that CA (because it is signed by a well-known CA, for example), then the validation will fail 16:51:56 bochecha: yeah. probably needs code change 16:52:18 or we put .fedora-server-ca.cert in a well known place 16:52:28 and ship the ca cert for what we use 16:52:30 dgilmore: moving to a well-known CA for the server would only require not to use a .fedora-server-ca.cert file any more, which is quite trivial :) 16:52:51 bochecha: well that depends 16:53:06 it's a client-side code change nevertheless, which means after that change older clients won't work, only the updated ones will 16:53:07 if we want to accept any well known CA or just the one we are using 16:53:09 bochecha: the last time I checked it ran curl with -k iirc and maybe disabled some other cert check as well 16:53:14 dgilmore: ah, right 16:53:18 tyll: not anymore 16:53:22 tyll: that was fixed afaik 16:53:30 tyll: I redid the whole lookasidecache handling in pyrpkg :) 16:53:45 tyll: now it all uses pycurl nicely 16:53:54 bochecha: ah, ok 16:54:08 tyll: it's in pyrpkg-1.35 16:54:21 (which is in stable for everything except EPEL7, and is on its way there) 16:55:51 bochecha: wonderful :-) 16:56:24 does anyone have anything else? 16:56:37 we are 30 minutes over 16:56:59 * tyll has nothing left 16:57:17 #endmeeting