18:00:04 #startmeeting Infrastructure (2016-05-12) 18:00:04 Meeting started Thu May 12 18:00:04 2016 UTC. The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:04 Useful Commands: #action #agreed #halp #info #idea #link #topic. 18:00:04 The meeting name has been set to 'infrastructure_(2016-05-12)' 18:00:04 #meetingname infrastructure 18:00:04 #topic aloha 18:00:04 #chair smooge relrod nirik abadger1999 lmacken dgilmore threebean pingou puiterwijk pbrobinson 18:00:04 The meeting name has been set to 'infrastructure' 18:00:04 Current chairs: abadger1999 dgilmore lmacken nirik pbrobinson pingou puiterwijk relrod smooge threebean 18:00:04 #topic New folks introductions / Apprentice feedback 18:00:14 * pingou here 18:00:23 any new folks around who would like to give a short one line introduction? 18:00:29 * puffi here 18:00:29 or apprentices with questions or comments? 18:00:56 * nb is not new but says hi :) 18:01:00 I've mentioned in the past about the possibility of doing some "job shadowing". I'm thinking eithier on a day were we specifically plan to carry out specific tasks that would work well been shadowed or as planned outages tasks come up we can mark them somewhere as tasks that will be "streamed" where us as appretices can watch how the more seasoned admins are doing specific tasks, be it updating ansib 18:01:03 * cverna here 18:01:06 le or....thoughts? :) 18:01:41 puffi: I see you added that as a discussion item... shall we wait and talk about it there? 18:01:42 * lousab here 18:01:53 nirik: sure, sorry 18:02:03 no worries. ;) 18:02:55 ok, lets move on to status/info dump... 18:03:07 #topic announcements and information 18:03:07 #info Fedora 24 Beta is out the door! - everyone 18:03:07 #info Thus we are no longer in freeze - everyone 18:03:07 #info Added memory to mailman01 to hopefully prevent alerts - kevin 18:03:07 #info Going to migrate the last of the lists to mailman01 this week - abompard 18:03:09 #info most NFS netapp mounts are now moved to nfsv4 - kevin 18:03:10 #info Planning to migrate comps and spin-kickstarts to pagure on friday - kevin 18:03:11 .hello coremodule 18:03:12 #info Koschei has been split into separate backend and web machines - mizdebsk 18:03:12 coremodule: coremodule 'Geoffrey Marr' 18:03:18 anything else folks would like to add or discuss there? 18:03:23 #info new pkgdb2 release - pingou 18:03:27 #info new mdapi release - pingou 18:03:27 hey coremodule - welcome 18:03:32 #info new pkgdb-cli release - pingou 18:03:43 and hopefully a new pagure release coming soon :) 18:03:45 pingou: does this solve those retire issues some people have hit? 18:03:49 awesome stuff :) 18:03:56 nirik: retire issue? 18:04:12 I thought there was some issue, but now I cannot recall details, so nevermind. 18:04:29 nirik, you were thinking of retread issues? :) 18:04:32 nirik: it solves the new package request w/ invalid monitoring_status flag issue 18:04:52 but I retirement issue doesn't ring a bell 18:05:01 nirik, Thanks, glad to be here. 18:05:05 nirik: oh yes the 500 errors that gets retry and retry 18:05:08 if I can find it, I'll mention it... 18:05:15 nirik: yeah, the new pkgdb-cli fixes that 18:05:28 threebean++ for the patch 18:05:38 coremodule: care to give a short one line intro of yourself? are you more interested in sysadmin or application developement or both? :) 18:06:45 Sure, I'm Geoff, I work with Red Hat on Fedora as a QA guy. I enjoy both sysadmin stuff and app dev, but currently seem to do more sysadmin that dev work. Just thought I'd swing by and see what's going on today. 18:07:09 Whoops, three lines. :/ 18:07:18 cool. ;) welcome again and do feel free to chime in with questions or comments anytime. 18:07:25 ok, lets move on to discussion items. 18:07:37 #topic Beta release day issues - kevin 18:07:52 so, tuesday was beta release day... it didn't go as smoothly as they usually do. ;( 18:08:15 The main problem was mirrormanager hadn't crawled the beta content, so it wasn't able to direct people to downloads correctly. 18:08:29 There's several things I think we can do... I can open tickets on them. 18:09:02 1. Make sure mirrormanager has access to the pre-open content... ie, it can read the content when its staged, not just when it's open/released. 18:09:48 2. we have mm fedmsg triggered, but the alpha/beta/final content dont' have any... we could either just make it run always once a day, always run it manually after staging, or come up with a fedmsg? 18:10:30 anyone have preferences there? 18:11:06 MM is already fedmsg triggered iiuc 18:11:25 each category is crawler after repo are generated 18:11:40 yes, but staging milestone content is manually done by dgilmore 18:11:46 there's no fedmsg there. 18:11:54 so as long as the Fedora category is updated in the few hours before it should crawl everything 18:12:14 well, on tuesday it was still crawling at release time. 18:12:23 it took a hour or two after release before it was done 18:12:37 so something triggered the crawl then 18:12:48 yeah, might have been an updates push 18:12:59 probably 18:13:17 it was staged on thursday/friday of the previous week, but with the dir closed... 18:13:30 so perhaps if we solve that it would solve the second part 18:13:56 from what I understood from adrianr that is likely yes 18:14:43 ok. I can make a ticket on it and we can try and sort it. ;) Should that be an infra ticket or a mm github ticket? 18:15:11 let's keep it in infra I'd think 18:15:32 ok. can file. 18:15:47 #action nirik to file a ticket to make sure mm umdl has access to pre-bitflip content 18:15:54 so ideally PDC should send the "beta is here" fedmsg? 18:16:12 lmacken: that could be one way yeah... well, 'beta is staged' 18:16:27 it's not really 'here' until release day 18:16:56 ah, yeah 18:17:09 ok, anything more on this one? or shall we move along? 18:17:51 #topic Fedorahosted migration / EOL - kevin 18:18:08 I just wanted to mention this again and progress/steps that are happening... 18:18:29 cverna++ 18:18:40 I gathered a list of projects that used trac so far this year... I will be trying to contact them and ask them what needs they might have from pagure or the like 18:18:45 \o 18:18:54 yeah, we need the importer, but cverna is rocking along on that. ;) 18:18:58 cverna++ 18:18:59 nirik: Karma for cverna changed to 2 (for the f23 release cycle): https://badges.fedoraproject.org/tags/cookie/any 18:19:36 nirik, if people just need a ticketing/documentation does pagure fit that or do we have something else? 18:19:39 yeah now we need a new release of pagure to start using the importer 18:19:47 cverna: I am planning on moving spin-kickstarts tomorrow... I guess I can use what we have now on the importer? 18:20:06 smooge: it should do that. 18:20:30 cverna: for the ticket times/updates to tickets times? 18:20:30 nirik: nope, we need couple of PR in pagure 18:20:52 .hello linuxmodder 18:20:53 linuxmodder: linuxmodder 'Corey W Sheldon' 18:20:54 (late) 18:20:59 smooge: there's a docs repo and it's issues are pretty good next to trac. ;) 18:21:06 welcome linuxmodder. :) 18:21:09 nirik, I was just wanting to make sure it wasn't using a tool not meant for that 18:21:21 as in trac wasn't really good about it but we used it because.... 18:21:25 was in another sig mtg sorry 18:21:31 * linuxmodder reading scroll 18:21:32 nirik: no this is fixed actually , there are some solved PR which need to be deployed on the live pagure instance 18:21:46 smooge: there may be some things people need that aren't there yet, but I want to gather them from existing projects and see if we can add them to pagure. 18:21:58 nirik: even got the tags imported now 18:22:06 cverna: awesome. ;) 18:22:16 pingou: when might we see a new pagure release? ;) 18:22:51 pingou: also, in pagure's pagure repo, could we have a tag like 'fedorahosted' or something to mark issues that are for the fedorahosted migration? so we can easily see them? 18:23:45 has anyone considered adding http redirects from fedorahosted to pagure for projects that migrate? 18:23:48 nirik: looking at fixing the tests on el7 and then I'm hoping to make one 18:24:07 on pagure topic pingou can we make non ff pushes blocked by default ? 18:24:14 mizdebsk: we could, we have done it for projects that went elsewhere in the past. 18:24:22 linuxmodder: there is an option that project can activate 18:24:31 but I don't think I'll make it default 18:24:47 pingou: so, possibly tomorrow? :) before I migrate spin-kickstarts? 18:24:54 nirik: fingers crossed 18:24:57 pingou, I meant more globally to prevent what seems to have been culprit in fedora-mktg repo issue 18:24:59 ok, cool. 18:25:11 nirik: example of the import with fedocal http://52.50.242.190/fedocal/issues 18:25:17 linuxmodder: like I said, fedora-mktg can activate it if they want 18:25:20 pingou: in fedorahosted I think we did have deny non ff default. 18:25:46 it was and hence why I thnk it should on pagure as well repo sanity 18:25:48 nirik: yup 18:25:57 pingou, nb did for that repo 18:26:16 cool :) 18:26:28 nirik: at least one spin-kickstarts committer would have broken things all the time with ff enabled 18:26:34 pingou, if it WAS the default why not keep it ? 18:27:12 pingou: one other thing we might want to look at... is scaling out. How hard might it be to scale pagure over more instances? Or, would it be an idea to make a second one and just sync to it as a warm spare? or should we not worry yet? 18:27:14 linuxmodder: because it wasn't on pagure 18:27:23 dgilmore: yeah, definitely will set it for all my repos. ;) 18:27:48 nirik: it can be set per branch (all is default) 18:27:54 so that feature branch can be rebased 18:27:57 all my forks are that way even on gh 18:29:30 * nirik prefers deny non ff, but if it's been defaulted to not deny it I can see why pingou wouldn't want to change it. 18:29:50 we did have some hosted projects that requested it be allowed for their projects. 18:30:05 which makes me think that the approach I took might not be the best, need to check that 18:30:20 sure, can look into it. 18:30:48 personally I've always been opt-in to non ff but that's me 18:31:16 oh, one other fedorahosted/trac user that came up: freemedia. It was sort of poorly fit into trac. We need to figure out what to do with it... 18:31:20 that way myself in a drunken code moment or a new contrib can't so eaily blow my repo 18:31:38 we could modify pagure to handle it, but perhaps we want to just do something else better. Not sure, but everyone can think on it. 18:31:53 nirik, talk was to move it to pagure already but finding its host files was the issue 18:32:08 re: freemedia ^ 18:32:17 host files? 18:32:23 which I recently joined and began helping with that 18:32:36 pagure won't work for it currently. it does a bunch of stuff pagure doesn't do right now... 18:32:43 the source files (where what is seen is hosting from ) 18:32:53 oh, you mean the php form part? 18:32:55 like? 18:33:06 and the html stuff yeah 18:33:15 its on batcave yes? 18:33:24 yes, all thats in ansible repo 18:33:29 seems so at least from the SOP doc 18:34:20 things like: filing tickets with email on cc with no pagure accounts, not sure if the api allows creating the tickets the way it does now, might be others. 18:34:45 we don't need to solve this now, but just wanted to get people thinking about it. 18:35:42 I think thats all I had on this now, I'll be contacting projects and asking them for more feedback and to file issues on things they need. 18:36:05 nirik, can't seem to resolve batcave even on bastion what am I missing? 18:36:17 it's batcave01.phx2.fedoraproject.org 18:36:29 was missing the phx2 bit thanks 18:36:37 #topic Apprentice job shadowing - puffi 18:36:46 I've mentioned in the past about the possibility of doing some "job shadowing". I'm thinking eithier on a day were we specifically plan to carry out specific tasks that would work well been shadowed or as planned outages tasks come up we can mark them somewhere as tasks that will be "streamed" where us as appretices can watch how the more seasoned admins are doing specific tasks, be it updating ansib 18:36:49 puffi: you still here? care to drive this? 18:36:51 too slow. ;) 18:36:52 le or....thoughts? :) 18:37:21 so you are looking at screencast type thing? or ? 18:37:42 I like this idea. 18:37:51 puffi, how about a sign on and vnc kinda thing even ? 18:37:54 I think it might be nice to have some screencasts available of common things we do... 18:38:17 like have the apprentice do on say stg while mentor does on prod 18:38:23 nirik: The technology we use I hadn't thought about and is probably the easiest part to figure out. But yes secure VNC session or sceencast etc 18:38:33 linuxmodder: exactly 18:38:33 +1 for screencast 18:38:33 some things don't lend themselves to this... where you are doing multiple things at once in different places... 18:38:46 but some would just fine 18:39:28 and what about to have soma server machine test to do it? 18:39:30 nirik, how hard / slow would it be to setup say 'training honeypots' for such things to have us apprentices test on behind something on stg with a faux network 18:39:34 * smooge can see an hour long youtube of me beating my head against a table swearing about data 18:39:36 nirik: Plus it can be then recorded and new appretnice's can view them from the infra wiki 18:39:46 smooge: heh 18:39:49 lol 18:40:18 puffi: mit.ji w/ shared screen? 18:40:23 puffi, not all of that could be public tho maybe on a vpn only accessible host 18:40:28 linuxmodder: well, somewhat difficult. For many things since everythign is open source you can just run your own at home or in a cloud instance, etc. 18:40:29 it's actually only 10 minutes but he blacked out after that and the camera kept rolling 18:40:48 pingou: thats a thought too. 18:41:11 https://meet.jit.si/ has a screen sharer, not sure how good it is. 18:41:37 well maybe just a this is the approved way via vnc and then shared access to a privately hsoted instance like you mentioned maybe do a hybrid thing 18:41:50 over 15 people it blows 18:41:51 puffi: so, perhaps you could keep an eye out for tickets you think might work well with this and we can try it out? 18:42:16 nirik: worked fine for me when using it with Ralphg 18:42:18 -g 18:42:24 I can try and see if I can do some screencasts of common stuff soon, like ansible playbook making a new machine or something... 18:42:25 I'll gladly be a guiny pig for those tests 18:42:25 nirik: I was thinking to start more run of the mill tasks you do on a daily basis, from acking an alert etc 18:42:34 nirik: exactly 18:43:01 puffi, like easyfix type stuff? 18:43:17 puffi: yeah, although thats not too exciting. Login to nagios, tatical overview, thing that alerted, ackknoledge, enter 'ack' in the comments, submit. ;) 18:43:29 (you wonder why I know that by heart? :) 18:43:44 for those things that are easy but worded in a way that seems like wtf that isn't an easy ticket 18:44:04 lol 18:44:12 nirik: Right the ack wouldn't be too exciting, but you could walk through your thought prcoess / troubleshooting steps for some of the common alerts, gives us an idea of where things are etc 18:44:14 that mundane eh? 18:44:37 like that logic flow puffi 18:45:08 puffi: sure... we could also try and be better about mention that kind of thing in irc... 18:45:17 ie, instead of 'got that alert' 'fixed' 18:45:40 do something like 'oh, thats that issue X we saw before, I just logged into the machine and restarted httpd to clear it' 18:45:47 nirik: Exactly, on BETA release day would have been a great example, but it was probably too hot in the kitchen for irc chit chat ;) 18:46:07 yeah, we really were under the gun to get things working, no time then. 18:46:26 We can talk about it after the fact tho. ;) 18:46:55 Since mirrormanager wasn't redirecting downloaders to mirrors, we setup a redirect to point them to the master mirrors (which we know have the content) 18:47:10 which unfortunately took us a while due to various mistakes. ;( 18:47:14 nirik: sure maybe after such an event we can have an autopsy session or similar 18:47:20 * pingou called away 18:47:49 yep. I did have a topic earlier in the meeting I could have been better detailed there. 18:48:14 anyhow, for this how about you all point out things you might like to see as a screencast or the like as we hit them... 18:48:27 nirik: Sounds good 18:48:37 and I will try and do some common ones too to test tools (will need folks to look at them and see what they need!) 18:48:39 well for more advanced apprentices maybe not but some of us would have lost it 15 misn in indeed 18:49:46 ok, anything else on this or shall we move on? 18:49:59 nirik, could we consider more of a 'office hours' like docs does or more of them 18:50:24 well, last we talked about this we added... 18:50:27 I think we can take anything else to channel or list 18:50:31 #topic Apprentice office hours 18:50:37 this section to our meetings. ;) 18:50:56 So, any apprentices looking for things to do, or with questions or whatever, feel free to chime in... 18:51:05 we could also look over the easyfixes 18:51:32 I'd love to learn more of the what is on what subnets and the like (like my missing phx2 part) 18:51:38 Glad to help wherever I can, as far as apprentice works goes... 18:51:58 that lack of understand what is where has been a killer for me 18:52:07 https://fedoraproject.org/easyfix/ 18:52:17 and most times when I have time everyone is sawing logs 18:52:19 linuxmodder: ah, I was going to talk about our various datacenters today... 18:52:26 but I think we are low on time. 18:52:32 so might push it to next time. 18:52:42 nirik, in channel works or me if its good for you 18:52:52 it can be confusing... as a lot of things just got added over time. 18:53:03 seeing that 18:54:09 I can do a quick overview today perhaps and then more detailed next time... 18:54:30 #topic Learn about: Our various datacenters - kevin 18:54:45 so, really quickly... we have a number of datacenters around the world. 18:55:12 Our biggest one, that has most of our machines in it we call 'PHX2' 18:55:20 it's in/near phoenix, az 18:55:45 yay 18:56:08 In that datacenter machines are in various domains: phx2 (our main normal network), qa (a more isolated network for qa and community stuff), ppc/arm/s390 (secondary arch stuff) 18:56:36 we also have all our management stuff (serial consoles, drac, etc) in a mgmt.fedoraproject.org zone there. 18:57:23 Next we have 2 datacenters that we have 3-4 machines at: 18:57:27 what isn't in phx2 then? 18:57:33 ibiblio, and osuosl 18:57:33 and is there a phx1 still? 18:58:02 ibiblio is in north ca, and osuosl is in corvallas, or 18:58:03 no 18:58:10 etherpad isn in osuosl right? 18:58:13 there's a phx1 still, but we have no machines at it. :) 18:58:27 yeah, gobby is there. 18:58:32 its a cold spare I assume? 18:58:52 just a datacenter we used to have machines at... everything was migrated long ago. 18:58:55 is there some map/chart of datacenters? hosts grouped by net/domait etc? 18:59:09 (happily before I was involved... smooge recalls the horror of that move ;) 18:59:11 that would be helpful inded d^ 18:59:36 that bad was it? 18:59:46 not enough time to go over it 19:00:32 then there's a number of smaller datacenters where we have usually 1 machine only... this includes: tummy.com (colorado), dedicatedsolutions, bodhost (england), inernetx01 (germany), host1plus (germany), coloamerica (california) and possibly some I forget. 19:00:34 it wasn't the worst I have dealt with. but it wasn't a picnic 19:00:59 There's no map, but there is a very useful file on batcave01: /var/log/virthost-lists.out 19:01:13 this lists every virthost and every guest thats on those. 19:01:30 so for example you can see: 19:01:32 bodhost01.fedoraproject.org:proxy07.fedoraproject.org:running:1 19:01:49 that says that proxy07 is a guest on bodhost01... so you know that it's at that datacenter. 19:02:21 or that dedicatedsolutions has: 19:02:27 dedicatedsolutions01.fedoraproject.org:mirrorlist-dedicatedsolutions.fedoraproject.org:running:1 19:02:28 dedicatedsolutions01.fedoraproject.org:proxy11.fedoraproject.org:running:1 19:02:36 two vms running there. 19:02:37 etc 19:02:49 I guess we are over time now, but wanted to try and do an overview... 19:02:52 that's useful indeed 19:03:06 nirik: thanks 19:03:12 #topic Open Floor 19:03:17 any items for open floor real quick? 19:03:20 nirik, smooge would either of you be opposed to one of us making such a map ? 19:03:33 linuxmodder: not at all, someone did a long time ago... 19:03:41 very useful..thanks nirik 19:03:50 it just gets out of sync pretty quick 19:03:58 so I have no objections 19:04:09 Yes, agreed. That would be very helpful. 19:04:09 i was going to mention an idea for easyfix: create script to generate host maps from ansible inventory and/or virthost-lists.out 19:04:15 it would stay in sync this way 19:04:27 sure, that could work. ;) 19:04:34 smooge, I can offer to try to reduce that sync delay 19:04:39 we do have an ansible 'datacenter' variable 19:05:16 ok, if nothing else will close out in a minute... 19:05:20 another thing related to apprentices: i'm not really looking for new things to do, but i'm in apprentice group basically just to get access to some machines related to things i care about 19:05:25 yeah.. between that and the logic in the script that makes the virthost plus looking at the non-virthosts a map could be built 19:05:34 well now that i kinda have a better brainmap of the space expect to see me doing more 19:05:36 how about having some other group (maybe the existing "sysadmin") that would give people access to various places (without sudo)? 19:06:01 sysadmin-journeyman 19:06:10 mizdebsk - there's alot you can do without access 19:06:44 if you have something specific you want to work on ppl usually are eager to review patchess when they have time 19:06:48 there's a lot i can do by myself without bothering admins if i have shell access 19:06:51 mizdebsk: we could, would prefer not to do sysadmin as it's not a shell group I don't think 19:07:01 well I see what mizdebsk is asking.. he is already in apprentice but isn't fullfilling that role. His role is higher than that but less than sudo-master 19:07:08 * nirik nods. 19:07:17 nirik, we have plenty of sysadmin groups we could repurpose :) 19:07:29 sure, we can come up with something. 19:07:37 plus cleanout some of them 19:07:47 mizdebsk: perhaps you could file a ticket on that and also one for a map easyfix (or linuxmodder if you wanted to do that one)? 19:07:58 will do 19:07:59 so we don't forget to do it. ;) 19:08:29 ok, thanks for coming everyone. do continue in #fedora-admin, #fedora-noc and #fedora-apps. 19:08:32 #endmeeting