15:00:09 #startmeeting Infrastructure (2020-03-12) 15:00:09 Meeting started Thu Mar 12 15:00:09 2020 UTC. 15:00:09 This meeting is logged and archived in a public location. 15:00:09 The chair is cverna. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:09 Useful Commands: #action #agreed #halp #info #idea #link #topic. 15:00:09 The meeting name has been set to 'infrastructure_(2020-03-12)' 15:00:10 #meetingname infrastructure 15:00:10 The meeting name has been set to 'infrastructure' 15:00:19 #chair nirik pingou smooge cverna mizdebsk mkonecny abompard 15:00:19 Current chairs: abompard cverna mizdebsk mkonecny nirik pingou smooge 15:00:19 #info Agenda is at: https://board.net/p/fedora-infra 15:00:25 #info About our team: https://docs.fedoraproject.org/en-US/cpe/ 15:00:25 #topic aloha 15:00:32 morning everyone. 15:00:44 Hello o/ 15:00:54 \o 15:00:58 .hello nphilipp 15:00:59 nils: nphilipp 'Nils Philippsen' 15:01:06 morning 15:02:04 #topic Next chair 15:02:04 #info magic eight ball says: 15:02:15 #info 2020-03-19 - smooge 15:02:15 #info 2020-03-26 - ??? 15:02:45 anyone wants to run next meeting ? 15:03:09 it is super easy, you just have to follow the instructions here --> https://board.net/p/fedora-infra 15:04:07 I suppose I can... now that it's a bit later in my morning due to dst... 15:04:41 thanks nirik 15:04:56 #info 2020-03-26 - nirik 15:05:12 #topic New folks introductions 15:05:20 #info This is a place where people who are interested in Fedora Infrastructure can introduce themselves 15:05:20 #info Getting Started Guide: https://fedoraproject.org/wiki/Infrastructure/GettingStarted 15:05:52 anyone new that would like to give a quick introduction ? 15:07:12 seems no... ;( 15:07:39 yeah :( 15:08:01 #topic announcements and information 15:08:14 #info ops folks are doing a 30min ticket triage every day at 19UTC in #fedora-admin - please join 15:08:33 #info f32beta freeze still in effect 15:08:35 #info CPE Sustaining team has daily standup (Monday-Thursday) at 15UTC in #fedora-admin - please join 15:08:44 #info Fedora Infrastructure will be moving in 2020-06 from its Phoenix Az datacenter to one near Herndon Va. A lot of planning will be involved on this. Please watch out for announcements on changes. 15:09:16 #info Fedora Communishift will be moving to new datacentre in April. Current downtime is expected to be from 2020-04-10 -> 2020-05-01. Please watch out for announcements on changes. 15:09:29 #info Taskotron will EOL in 2020-05 15:09:42 anything else ? 15:10:37 ok let's move on then :) 15:10:43 #topic Oncall 15:10:44 #info https://fedoraproject.org/wiki/Infrastructure/Oncall 15:10:57 #info nirik is oncall 2020-03-05 -> 2020-03-12 15:10:57 #info smooge is oncall 2020-03-12 -> 2020-03-19 15:11:02 #info cverna is oncall 2020-03-19 -> 2020-03-26 15:11:03 #info ???? is oncall 2020-03-26 -> 2020-04-02 15:11:11 .takeoncallus 15:11:20 .oncalltakeus 15:11:20 smooge: Kneel before zod! 15:11:22 I think we are well covered for the next couple weeks 15:11:46 we can probably wait until next meeting to find someone for the week of the 26th 15:11:53 There was lots of small pings for various things I intercepted... nothing too noteworthy 15:12:37 and the rabbitmq outage :) 15:13:15 yeah, that was... not good. 15:13:42 as far as I can tell it was because 01 was in a bad state... after I rebooted it yesterday it's stayed fine. 15:13:53 but I am not sure what caused that state 15:14:39 cool, does this seems linked to when we update koji ? If I remember last time we had this was around a koji update time too ? 15:14:48 but maybe I don't remember correctly 15:15:09 yeah, but... not sure how that could cause any problems on rabbitmq... 15:15:32 just a httpd restart on the koji hubs... 15:15:35 yeah seems not really linked together 15:16:03 they might be, but not sure... 15:16:32 ok moving on 15:16:36 #topic Monitoring discussion [nirik] 15:16:36 #info https://nagios.fedoraproject.org/nagios 15:16:36 #info Go over existing out items and fix 15:16:42 lets see 15:17:22 the two down hosts are expected (one has old ip other one is down so we can steal it's network port for another machine) 15:17:43 regular datanommer ones. 15:17:56 fedoraplanet messages would explain why planet is not updated 15:17:56 regular swap low ones (hopefully fixed in new rhel kernel) 15:18:10 no, the messages stopped working long ago... 15:18:22 ha :( 15:18:29 37d 14h 29m 59s 15:18:36 there's a ticket on it. 15:18:41 needs someone to dig into it 15:18:54 ha yeah did not notice the time 15:18:57 the reason for not updating is very likely a stuck process in fetching blogs. 15:19:00 it's done that before. 15:19:37 hello 15:19:38 thats about it here, move on I think. 15:19:41 hey clime 15:19:47 nirik: hi 15:19:59 #topic backlog discussion 15:19:59 #info go over our backlog and discuss and determine priority 15:20:19 cverna: so, did we get any clarity from the 5 you posted? 15:20:21 So I sent an email with 5 tickets I guess this is a good time to review these 15:20:37 let me find the link to the email 15:21:10 #link https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedoraproject.org/thread/H4FBSGJGKK5ICNEDGQUDSM7N5HSJHF4J/ 15:21:33 I summarized the discussion in the mail thread 15:22:30 .tickets 8455 15:22:44 .ticket 8455 15:22:45 cverna: Issue #8455: Move mailman to newer release of Fedora or CentOS - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8455 15:23:02 Trouble : High Gain: Medium 15:23:06 I think this might be a mini initiative... 15:23:16 but also, I think it's blocked on packaging work right now. 15:23:22 .ticket 8167 15:23:22 Trouble : Low Gain: Medium 15:23:23 cverna: Issue #8167: Adding topic authorization to our RabbitMQ instances - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8167 15:23:54 This is something we might want to do, prioritize 15:24:18 yeah. we do still need to update prod to the newer rabbitmq for this. 15:24:39 we could plan that after freeze 15:25:23 .tickets 8035 15:25:35 * .ticket 8035 15:25:44 rhh 15:25:56 .ticket 8035 15:25:58 cverna: Issue #8035: A few final ansible secrets for kerneltest - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/8035 15:26:13 Trouble: Low Gain: Low 15:26:28 lets ping on this one and see if it's still needed/wanted? 15:26:49 yes, I can do that 15:27:01 .ticket 7935 15:27:03 cverna: Issue #7935: Nightlies (Rawhide and Branched) not imported to PDC - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/7935 15:27:14 Trouble: High Gain: Low 15:27:55 sounds that it started to work again, I don't think we will fix importing back the one we have missed tho 15:28:20 yeah, not sure anyone knows how to do that... 15:28:45 .ticket 7919 15:28:47 cverna: Issue #7919: Fix fas fedmsg sending in openshift - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/7919 15:28:54 Trouble: Medium Gain: Medium 15:29:41 so, should we label all these so we can sort them in the tracker? 15:29:53 yeah :) 15:30:06 you want me to make labels/label them? or you want to? 15:30:20 you want to do it ? or I can. I don't mind 15:31:03 I don't care either. ;) how about I make the labels and you mark these? 15:31:19 sounds good 15:32:00 ok we have a topic for discussion 15:32:03 #topic how will production apps still on VMs and not moved to openshift be affected by the datacenter move? - tflink 15:32:32 specifically, I'm interested in blockerbugs but this may apply to other apps 15:32:42 I guess that depends if the production app is in the Minimun Viable Fedora or not 15:33:08 we need blockerbugs to release fedora - it's an important part of the validation and release process 15:33:40 right. 15:33:50 so this depends on MVF as cverna mentioned. 15:33:59 does blockerbugs need to be running then? 15:34:24 note that the move is after f32 is out and before f33 is really started... 15:34:31 unless I'm mis-remembering the details of MVF, it depends on when the outage happens 15:34:57 so the general plan is this: 15:35:30 * we will be taking down some non essential servers, shipping them to iad2 (new datacenter) 15:35:48 during this time everything should be running with the possible exception of some staging stuff perhaps 15:36:12 * with those + new servers, we will get things ready at the new datacenter. 15:36:33 blockerbugs in early post-release is not super-critical, no 15:36:38 * very late may/early june we will switch over to that datacenter and ship all the rest of the stuff 15:36:46 makes it harder to run blocker review meetings and for non-experts to propose blockers. 15:36:59 during this time we have limited resources... this is the MVF (which I wanted to call degraded) 15:37:25 once servers arrive and are brought back into service we ramp back up to capacity. 15:37:38 I don't recall how we marked blockerbugs, let me look 15:37:47 smooge: do you recall? 15:37:56 sorry too many meetings 15:38:07 * nirik is ignoring that other one. ;) 15:38:51 we marked blockerbugs as non-essential 15:39:32 ok, couldn't recall. 15:39:58 so, should we add it ? or is it ok if it's down for a few weeks in june? 15:40:03 so it should be back up before we do a F33 beta 15:40:10 I don't know if we have any resources to do so 15:40:28 we ended up adding some things yesterday already 15:40:32 as long as it's back by say branching, that's okay, i'd say. 15:41:18 we have to have pretty much everything back by mass rebuild 15:41:28 our plan is to have as much of the site up by mass rebuild 15:41:50 I think we should have all prod stuff up long before that. 15:42:09 branching is 2020-08-11 15:42:37 ooh, plans 15:42:38 mass rebuild is 2020-07-22 15:42:46 i love plans. i love the tinkling sound they make when they break 15:43:02 ha. and this one isn't complex... no sir, not at all. 15:43:07 adamw, yep 15:43:24 currently the plans have multiple 15:43:29 tflink: so, blockerbugs and resultsdb... any other services you want to mention? 15:43:46 nope, those are all 15:43:47 i am working on removing them as best as possible but it will eb hard 15:43:57 * tflink was going to ask about resultsdb at openfloor, though 15:44:01 adamw: for openqa our new plan (after we determined we didn't want it to be down any)... 15:44:45 ok let move to open floor then 15:44:48 whether there has been any progress on figuring out who's going to own resultsdb going forward, rather 15:44:50 was to ship some workers in... uh... may? sometime... and then we can switch it over to the new dc end of may/june with other services 15:44:58 #topic Open Floor 15:45:09 tflink: pingout was working on that, but hes... out this week 15:45:38 yeah so we don't know :( 15:46:10 ok, just wanting to avoid last minute confusion as much as possible :) 15:46:41 yeah, I hope we hear something soon. 15:46:47 anything else for open floor ? 15:47:27 oh... one more thing on dc move... 15:47:57 look for a public set of tickets/timeline at some point. so we can hopefully get others to look over things and tell us where we missed something. ;) 15:48:35 * nirik has nothing more right now 15:49:56 ok thanks all for coming 15:50:01 #endmeeting