15:03:42 #startmeeting Infrastructure (2020-07-23) 15:03:42 Meeting started Thu Jul 23 15:03:42 2020 UTC. 15:03:42 This meeting is logged and archived in a public location. 15:03:42 The chair is siddharthvipul_. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:03:42 Useful Commands: #action #agreed #halp #info #idea #link #topic. 15:03:42 The meeting name has been set to 'infrastructure_(2020-07-23)' 15:03:48 .hello darknao 15:03:49 darknao: darknao 'Francois Andrieu' 15:03:50 * siddharthvipul cracks knuckles 15:03:50 #meetingname infrastructure 15:03:50 #chair nirik pingou smooge cverna mizdebsk mkonecny abompard siddharthvipul mobrien 15:03:50 #info Agenda is at: https://board.net/p/fedora-infra 15:03:50 #info About our team: https://docs.fedoraproject.org/en-US/cpe/ 15:03:50 #topic aloha 15:03:50 The meeting name has been set to 'infrastructure' 15:03:50 mmm, zodbot knock knock 15:03:50 .ping 15:03:51 nirik: ^ help please :P 15:03:53 siddharthvipul: Error: Can't start another meeting, one is in progress. 15:03:54 #meetingname infrastructure 15:03:54 The meeting name has been set to 'infrastructure' 15:03:56 pong 15:04:04 morning 15:04:05 morning 15:04:08 #chair nirik pingou smooge cverna mizdebsk mkonecny abompard siddharthvipul mobrien 15:04:08 Current chairs: abompard cverna mizdebsk mkonecny mobrien nirik pingou siddharthvipul siddharthvipul_ smooge 15:04:10 hello 15:04:14 siddharthvipul_: I think its the VPN that is down 15:04:19 #info Agenda is at: https://board.net/p/fedora-infra 15:04:28 #topic aloha 15:04:39 morning 15:04:47 Morning 15:04:51 sorry folks, I joined through irccloud.. as mboddu said the vpn (my znc is behind it) is broken 15:04:58 so I will handle the meeting from here 15:05:06 sounds good 15:05:21 .hello siddharthvipul1 15:05:22 siddharthvipul_: siddharthvipul1 'Vipul Siddharth' 15:05:38 I guess it's back? 15:05:43 awesome 15:05:44 * mboddu is sorta here as I am twiddling with other composes :D 15:06:15 nirik: tflink darknao \o welcome :D 15:06:49 #topic New folks introductions 15:06:50 #info This is a place where people who are interested in Fedora Infrastructure can introduce themselves 15:06:50 #info Getting Started Guide: https://fedoraproject.org/wiki/Infrastructure/GettingStarted 15:06:55 anyone new here? 15:08:10 alrighty, moving ahead then :P 15:08:31 #topic Next chair 15:08:32 #info 2020-07-30 - mboddu 15:08:41 Its me... :D 15:08:44 we need a volunteer for 2020-08-06 15:08:51 mboddu: for next ;) 15:09:08 I have never done it before so I can do it with some guidance 15:09:20 mobrien[m]: I would love to help (not that there is a lot to it) 15:09:22 it's easy. ;) and fun! 15:09:31 I agree :) 15:09:35 I love it 15:09:37 Cool. Put me down for it 15:09:44 #info 2020-08-06 - mobrien[m] 15:09:50 thank you mobrien[m] :) 15:10:04 we are 2 weeks ahead, let's move to the next topic 15:10:16 #topic announcements and information 15:10:17 #info CPE Sustaining EU-hours team has a Monday through Friday 30 minute meeting going through tickets at 0830 UTC in #centos-meeting 15:10:17 #info CPE Sustaining NA-hours team has a Monday through Friday 30 minute meeting going through tickets at 1800 UTC in #fedora-admin 15:10:17 #info Fedora Infrastructure will be moving in 2020-06 from its Phoenix Az datacenter to one near Herndon Va. A lot of planning will be involved on this. Please watch out for announcements on changes. 15:10:18 #info Fedora Communishift move has started but will take longer than expected. Current estimate for bringing back into production is TBD 15:10:21 #info https://fedoraproject.org/wiki/Infrastructure/2020-post-datacenter-move-known-issues has a list of known items from post datacenter move 15:10:24 #info CFP for Nest with Fedora is now officially open! https://communityblog.fedoraproject.org/nest-with-fedora-cfp-open/ 15:10:32 please take 2 minutes to read the announcements and if you have something, please share 15:11:03 mobrien[m]: Next meeting will be my first run as well, welcome to newbie party :D 15:11:26 mboddu: :D 15:12:05 more toddlers are running 15:12:18 I mean if I can do it, anyone can do it :P mobrien[m] even the toddler in town 15:12:36 haha, 2 meaning 15:12:52 pingou: kids grow very quick, just couple of weeks ago they were just crawling and now they are running :P 15:13:12 ha 15:13:14 haha 15:13:31 and soon they will be breaking everything 15:13:41 mobrien[m]: lol :D 15:14:01 * siddharthvipul nods..don't give me access 15:14:02 and hopefully they don't find scissors or knives anytime soon ... 15:14:25 cutting down services... I like it :P 15:14:51 okie dokie, what's next 15:14:54 #topic Oncall 15:14:54 #info https://fedoraproject.org/wiki/Infrastructure/Oncall 15:15:05 #info siddharthvipul is oncall for 2020-07-16 -> 2020-07-23 15:15:05 #info mboddu is oncall for 2020-07-23 -> 2020-07-3 15:15:19 .oncalltakeus 15:15:19 mboddu: Error: You don't have the alias.add capability. If you think that you should have this capability, be sure that you are identified before trying again. The 'whoami' command can tell you if you're identified. 15:15:27 * mboddu has to login 15:15:30 we need a volunteer for 2020-07-30 to 2020-08-06 15:15:43 mboddu: sure thing.. I will remind you in open floor to retry 15:16:14 I will volunteer for this too. Same as before I will need some guidance 15:16:19 .oncalltakeus 15:16:19 mboddu: Kneel before zod! 15:16:30 Okay, that worked 15:16:36 mobrien[m]: :D 15:16:41 cool 15:16:50 #info mobrien[m] is oncall for 2020-07-30 to 2020-08-0 15:17:01 thank you mobrien[m] 15:17:04 mobrien[m]++ 15:17:24 #info Summary of last week: (from current oncall) 15:17:26 #info Summary of last week: (from current oncall) 15:17:32 okay, so it was me 15:17:47 in total I got just 2 pings and both were false alarm (kinda) 15:17:59 very peaceful (one of the was yesterday night 2am my time) 15:18:04 oh, I should mention: https://fedoraproject.org/wiki/Infrastructure/2020-post-datacenter-move-known-issues 15:18:16 in case folks ask about anything there thats not back 15:18:34 nirik: should I add it to announcements? 15:18:47 nirik: that's awesome.. thank you 15:18:56 sure, please do 15:19:01 mboddu: mobrien[m] ^ take note, will be useful while on call :P 15:19:20 oh it's already there 15:19:21 Yup, already bookmarked it 15:19:38 oh yeah, I already added it last week. nevermind 15:19:41 okay, so that's all.. nothing from the current oncall. overall a stable week 15:20:02 #info calm and peaceful week 15:20:06 #topic Monitoring discussion [nirik] 15:20:13 ok, lets see... 15:20:35 #info https://nagios.fedoraproject.org/nagios 15:20:36 #info Go over existing out items and fix 15:20:51 we have a number of machines down, but most are ones that need a power cycle and we don't yet have access to pdu's at the new datacenter. 15:21:01 I'll nudge them again about that today 15:21:22 The usual low swap on some virthosts. 15:21:54 some misc small checks. 15:22:04 pingou: pagure01 is now at 90% on it's disk. ;( 15:22:45 This one needs removed from nagios: pkgs01.iad2.fedoraproject.org Check for fedmsg-hub proc 15:23:32 so basically it's kinda noisy, but nothing urgent... 15:23:53 if folks want to work on cleaning things up, let me know and I'm happy to help provide pointers. 15:23:59 thats it, we can move on... 15:24:10 #topic Data-Center Move update 15:24:32 nirik: stats please :P if something is new 15:24:51 so, we have most of stg virthosts up now... I am going to make a proxy01.stg today if I get time... and then we need to deploy openshift, then we can deploy noggin. ;) 15:25:08 exciting 8.* 15:25:13 we still have a number of machines to bring up also, mostly openqa alternative arch boxes... 15:25:13 *.* 15:25:14 nirik: aouch for pagure01, though we need to migrate it to rhel8, so maybe 1 stone 2 birds 15:25:44 no word on rdu2, I'll try and move that forward today too... 15:26:15 pingou: yep. +1 15:26:47 hopefully we can have stg back up in the next few weeks... 15:27:33 so before moving ahead, a little bit of context. I noticed a few tickets around proxies (that needed just a restart) and thought I could have done that instead of mboddu or nirik wasting time on this.. pingou recommended this section's revival. 15:27:40 #topic: Learn about Fedora's proxy setup 15:27:59 so can anyone (I am guessing nirik) teach a little about how Fedora's proxy is setup 15:28:17 sure! 15:28:18 * mboddu is also interested 15:28:42 :) 15:28:46 so, we have a number of proxy servers, setup all over the world. Some donated hardware, some in aws, some in our datacenters 15:29:15 we have a dns setup that gives different 'views' to different regions. 15:29:36 so, for example NA (north america) gets ips for the proxies in NA... 15:30:13 each of those proxies are connected back to our main datacenter via a vpn (openvpn) 15:30:41 They each run httpd and then behind that haproxy and varnish... 15:31:12 requests come in, hit httpd and then it talks to haproxy and/or varnish and they in turn reach over the vpn to the service and proxy it back to the user 15:31:58 additionally, for static content, each proxy has it's own static files it serves. These are typically "built" on sundries01 server and synced to all the proxies. 15:32:08 for example, docs, getfedora.org, etc... 15:32:33 this allows users to hit the proxies near them that might have things cached... 15:32:56 this last week we had a problem with varnish on some of the proxies growing in memory and causing a OOM kill. ;( 15:33:29 Does that make sense? any questions ? 15:33:42 nirik: does apprentice group has access to sundries? 15:34:06 yep 15:34:09 oh, haproxy also can do HA... ie, it can talk to say 2 backends for an app, so if one is down, it just uses the other one transparently. 15:34:13 yeah, should have 15:34:33 oh, and also proxies running our mirrorlist service. 15:34:43 must be something wrong with my sshconfig 15:34:46 will check it 15:35:08 nirik: Thanks nirik for the explanation, it does help a lot 15:35:17 thanks nirik 15:35:17 that gets data every hour from mirrormanager and serves it out to users... thats the thing that answers for mirrors.fedoraproject.org, probibly our most critical service 15:35:19 so nirik, in order to help with this, how much access is needed? I know it's fairly more complicated to get access for this but just wondering 15:35:21 nirik++ 15:35:42 I generally look at the nagio's alerts and check the journalctl of whats going wrong, but I dont know how exactly they are setup, this helps 15:35:45 nirik++ 15:36:03 well, there was a suggestion (I think from mobrien[m] ?) to make a playbook that could restart services... that would be nice and we could add more access for that. 15:36:07 well, nirik this was awesome! I love this part and I think we should do more of this 15:36:18 siddharthvipul: yeap. All for it. ;) 15:36:23 .ticket 9161 15:36:24 mobrien[m]: Issue #9161: Create ansible playbook to restart failed services on proxy hosts - fedora-infrastructure - Pagure.io - https://pagure.io/fedora-infrastructure/issue/9161 15:36:52 mobrien[m]: I am interested in helping :) (if it's fine to work on it next week and you are not in hurry) 15:36:53 I think that could even be just a generic 'restart failed services' on a host playbook... might be useful other places too 15:37:09 ah, easy! 15:37:19 (or I think so) :P 15:37:28 siddharthvipul: sure. We can do it together if you like 15:37:35 mobrien[m]: deal :D 15:40:11 anyone has any questions? 15:40:28 we don't have a lot after this so I am going to leave it to this for a while.. in case someone has something 15:41:00 nirik I was thinking that at first too where the host could be a var passed at run time but rbac doesn't allow for that 15:41:20 yeah, true. hum 15:42:19 I was thinking you could list all the proxies as the host in this playbook and use the --limit flag to choose the specific one when you run 15:43:43 sure. 15:43:47 that works 15:45:15 last 15 minutes for open floor 15:45:17 #topic Open Floor 15:46:35 oh, I was going to see about bringing up packages... 15:46:44 but I guess we can continue on the list. 15:46:46 * smooge puts away the shovel 15:46:57 I was working on burying it 15:47:20 well, the old one is dead and burried. 15:47:26 but we have _2_ new ones... 15:47:42 which I don't think will work out too well if we deploy both 15:47:58 ah ok.. the new ones I don't worry about.. when you said bringing it up I thought I was going to have to start digging again 15:48:07 odd hits you get app A, even hits you get app B. 15:48:34 red-blue testing at its finest 15:49:17 anyway... I don;'t have anything else to derail this meeting with 15:51:33 I wanted to try and contribute to the Fedora website as a first timer, but there weren't many easyfix tickets I could attempt. Would it be possible for anyone to open new issues if they noticed anything :)? 15:51:50 https://pagure.io/fedora-web/websites 15:52:39 gchang: there is a bunch of work going on in fedora websites 15:52:51 even an outreachy student is working on it right now 15:53:06 cc: relrod ^ :) 15:53:22 * relrod looks up 15:54:01 gchang: yeah, happy to help get you set up to contribute. As siddharthvipul said we have quite a bit going on in websites atm, so good time to jump in :) 15:54:14 sweet! 15:54:25 On similar notes, we should try and add easy_fixes to https://pagure.io/fedora-infrastructure/issues as well 15:54:46 yeah, it's often hard, we should try 15:55:01 mboddu: a hackfest for 2-3 hour where we decide easyfix or find small issues that can be marked easyfix? 15:55:16 nitpicks that usually won't be seen in fedora-infrastucture issues? 15:55:41 the problem then becomes that no one fixes them and they sit there mocking us in tickets. ;) 15:56:19 Well, they will be there tagged as easy fixes for any new apprentice to look at 15:56:50 Makes it easier for new comers to help us 15:57:03 Or we can add them to a different repo, under fedora-infra 15:58:08 I think it needs to be a flow... if something sits for a while we should fix it and file something new. 15:58:34 because a years old easyfix might not be very easy to fix anymore. 15:58:57 but we can try and figure something out. ;) 15:58:58 less than 2 minutes to go :) 15:59:56 I think that's a good idea. Have easy fixes with a time limit for one of us to do it after a certain date. 16:00:26 I see, interesting 16:00:28 I like the idea 16:00:32 Yeah, I like the idea 16:00:39 okay, so closing the meeting :) 16:00:42 thank you all for coming 16:00:53 loved this meeting.. we were on time and covered a lot 16:00:57 #endmeeting