20:00:54 #startmeeting infrastructure 20:00:54 Meeting started Thu Feb 17 20:00:54 2011 UTC. The chair is CodeBlock. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:54 Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:01:02 Guess I'm doing this, this week. :( 20:01:11 #topic roll call 20:01:15 * averi is around 20:01:18 * ricky 20:01:21 * CodeBlock 20:01:22 * sijis is around 20:01:25 * ianweller waves 20:01:27 #chair ricky 20:01:27 Current chairs: CodeBlock ricky 20:01:28 is here 20:01:31 yay 20:01:33 #chair smooge 20:01:33 Current chairs: CodeBlock ricky smooge 20:01:40 #meetingname infrastructure 20:01:40 The meeting name has been set to 'infrastructure' 20:01:52 #topic Agenda 20:02:14 ok those with one please be open about it. Mine seems to be to create conspiracy theories 20:02:34 lol 20:02:42 what - topics to discuss? 20:03:08 I am here to talk about blogs, please enlight me when we are at it 20:03:28 :) 20:03:31 I have two - 1) Do we want to look into yubikey auth on more of our infra?, and 2) Shall we look at what else needs done to kill nagios2 with fire and move to nagios 3? 20:03:47 and mailing lists cleanup as well on hosted 20:03:59 Rebuild of publictest boxes 20:04:04 and Removal of old sysadmins 20:04:13 and Training program for sysadmins 20:04:17 ok 20:04:32 #topic ) Do we want to look into yubikey auth on more of our infra? 20:04:39 CodeBlock, <- 20:05:04 Alright - well... now that mmcgrath sent me a yubikey, I'm kind of growing attached to it :) 20:05:12 I know we support it in FAS and on people01 for ssh 20:05:28 Any thoughts on expanding it and letting more of our infra accept yubikey auth? 20:06:03 well the question is how is it to be used. 20:06:31 I'd prefer to have actual two factor auth over just using it as is. Not sure if I prefer just password over just yubikey) 20:06:47 make it two factor (eg a pass+yubiOTP) or one-factor (pass or yubOTP). use it for sudo? 20:06:56 two factor I agree with 20:07:39 sudo...possibly. It can't be used for sudo on people01 right now. I have tried. 20:08:17 * dgilmore shows up 20:08:20 yeah that has been my only attempt also :) 20:08:29 For sudo, it'd need to be implemented in a way that doesn't require it, which is probably easily doable with PAM. 20:09:01 well to use a favorite Seth line: If its not required why use it? 20:09:36 But yeah. I was just curious if there was any interest in adding/allowing it throughout more of the infra. phuzion and I were talking about it (he has one too now), and we were just curious as to what the plan was 20:09:37 Because all sysadmin-main people have one (with maybe one exception?) 20:10:10 well the plan is "we need a plan" 20:10:23 what systems should it be needed on. which systems should it not. why 20:10:41 we did talk in the past of requireing sysadmin-main people to have to use it 20:10:46 i would think db* servers would probably need it 20:10:50 especially on pt boxes 20:11:04 what is it meant to help protect. what is it not used for 20:12:20 smooge: mmcgrath was scared that a keylogger or a carelessly typed passwd on a pt box wold give a non sysadmin-main person a sysadmin-main users passwd 20:12:27 One option is required for sysadmin-main everywhere, another is just pt and other widely accessible machines. 20:12:34 dgilmore, I agree on it 20:13:01 smooge: so the idea was a way to require auth but make it more secure 20:13:26 * dgilmore uses his yubikey with bodhi 20:13:34 I'm almost inclined to like ricky's first option. Because it solves the issue of local keyloggers and such too 20:14:23 the only issue i see is that if say im traveling and only have my phone i cant fix things 20:14:36 because i dont have a working way to use yubikey 20:14:41 i'm not quite following - why would you want to use yubi on pt boxes? 20:14:44 though the chances of that are slim 20:14:51 dgilmore: wifi tether to a laptop ;) 20:15:19 sijis: Because we give sudo access on those to everybody 20:15:20 sijis: if i sudo on a pt box, and you have a keylogger you have my passwd 20:15:54 we should treat pt boxes as untrusted 20:16:05 perhaps even hostile 20:16:27 If we require it everywhere (for -main at least) as I said it loves the problem of local keylogging too. It's a one time password - use it once and it will never work again. 20:16:32 sijis: its easy to get pt box access. that includes sudo 20:16:49 gotcha. 20:16:53 we should trust every system as untrusted, and we should treat certain systems like pt/people/hosted as hostile 20:17:37 but I come from a shoot once, ask question later background 20:17:47 How hard is it to deploy yubikey auth (no matter how we do it)? 20:18:16 Pretty easy as is now 20:18:28 Just some global pam configs and that'd be all. 20:18:42 That's what I figured 20:18:43 alright 20:19:41 Well - it's just something to think about, we don't have to (and won't) come to a conclusion this meeting, but it's just something I was wondering the state of 20:19:58 oh sorry jumped the gun 20:20:46 I can send something to the list (still talking about yubikeys) and see what people think or something 20:21:31 as far as nagios 3 ... We have test nagios and zodbot on noc01.stg .. But nothing else that is on noc01....not sure what else needs tested. But .. 20:22:04 but nagios and zodbot work on el6, with no issues 20:22:08 nagios 3* 20:22:12 and our nagios configs 20:22:44 i volunteered to do testing once a plan was put together 20:22:48 for nagios 20:23:04 smooge upgraded stg a while back I believe 20:23:17 Well - our configs all seem to work - I'm not entirely sure how much more testing can/needs done 20:23:23 yes noc01.stg should be set up. 20:23:26 ok 20:23:31 marchant, you were going to make a testing plan 20:23:34 * dgilmore thinks we pull the trigger post freeze 20:23:35 actually there is something 20:23:39 dgilmore: agreed 20:23:39 rebuild as rhel6 and move 20:24:04 marchant, I do need a test plan as I want to use it for other upgrades 20:24:45 smooge: I did not realize you wanted me to create the plan 20:24:46 eg yes we see that the configs didn't barf on reload. thats a check mark. but usability, webshots, what we tested we need 20:24:59 marchant, oh well it can be pretty simple 20:25:06 baby-steps 20:25:24 we currently have nothing beyond "configs didn't bard on reload". 20:25:54 so a test that involved taking down things in stg to verify proper alerting would be sufficient? 20:25:55 so we need to go over what 8 things we want to test and just have that done 20:26:04 should be 20:26:21 CodeBlock, does that make sense? 20:26:27 Yeah that's fine 20:26:34 hi folks 20:26:39 hey skvidal :) 20:26:42 * skvidal just got networking back 20:26:44 sorry for being out 20:27:07 OK, I will work on a basic plan and perhaps email smooge with other questions? 20:27:15 marchant: ?? 20:27:27 skvidal: didn't miss too much - talked about yubikey stuff, and now talking about nagios 3 upgrade 20:27:29 sounds good 20:27:31 ah ha 20:27:32 okay 20:28:03 marchant, then work with CodeBlock to see that those tests can be done and that we get notified. 20:28:15 understood 20:28:18 skvidal: can confirm that the xmpp alerts work 20:28:29 yes 20:28:29 then we just need to build a new noc01 on virthost02 and we be happy 20:28:30 yes I can 20:28:32 * skvidal glares :) 20:28:36 by the day that he woke up and thought that the entire world was ... yeah 20:28:50 * abadger1999 here now and reads back 20:29:32 skvidal: Oh come on - you love those days - waking up grabbing a cup of coffee, and crapping yourself when you see .. what ~2000 alerts? ;D 20:29:42 CodeBlock: that aren't TRUE! 20:29:53 :P 20:30:25 Alright - next topic? 20:30:43 #topic Old sysadmin member removal 20:31:21 So smooge sent out a list of proposed people to remove, who haven't touched their access within...a certain time limit, that I forget (60 days?) 20:31:36 60 days. 20:31:45 I need to remove the people from sysadmin-cvs 20:32:10 but we should be ready to go 20:32:27 When are we looking at doing that - post freeze I'm assuming. 20:32:46 post freeze 20:33:08 Did you get my note that tibbs|h and some other cvsadmins should stay? 20:33:10 I will update after freeze to make sure I didn't miss something, redo and igure out how to do a mass mailing 20:33:10 (is there a gap between pre release freeze and real release freeze?) 20:33:38 * CodeBlock finds the SOP about freezes 20:33:46 ricky, yes. I tried to say above I am going to not remove them 20:33:51 but failed 20:34:00 there will be several freezes 20:34:09 Cool, thanks 20:34:35 1) freeze alpha (slushy) , 2) freeze beta (sort of solid), 3) freeze release (-40C) 20:34:51 smooge: how much time in between each? 20:35:33 * nirik arrives fasionably late. 20:35:54 usually about 2-3 weeks. So March 22 we will start beta freeze. April 26th? we will start final freeze 20:36:22 ok 20:36:33 #topic sysadmin training? 20:36:52 Ok I failed at this 20:36:52 CodeBlock: The freezes start two weeks before each release (alpha, beta, final) 20:37:06 abadger1999: ok 20:37:07 CodeBlock: The spacing in between just depends on when the releases are. 20:37:11 * nirik filed the tickets for the alpha release, BTW. 20:37:12 2 weeks before scheduled release 20:37:19 nirik thanks 20:37:24 nirik: Awesome! 20:37:25 so if a release gets delayed the freeze is extended 20:37:35 20:38:12 * dgilmore is pushing on time very hard 20:38:13 our goal this year. release on the same day as Ubuntu 20:38:24 haha 20:38:38 with Magea, us and Ubuntu we will bring down the InterTubes 20:39:46 anyway. I am hoping to get the writeup of what abadger1999 and skvidal have talked about for moving people from fi-newbie->fi-apprentice->fi-craftsman->fi-master->fi-pastmaster->fi-hiddenllamamaster 20:39:52 or some such thing 20:39:56 ah 20:39:56 okay 20:40:01 so here's my whole nefarious plan 20:40:11 I have an fi-apprentice group made 20:40:25 I have not added it to the $fas_group yet b/c of the freeze 20:40:40 and then add the acl to the puppet repo 20:40:51 so those folks can clone 20:40:54 but not commit to the repo 20:41:44 that's it 20:41:45 20:41:50 I was just waiting for the freeze 20:41:55 ok 20:41:59 but I'm happy to do it now 20:42:02 if y'all are cool w/it 20:42:21 skvidal: is that all they get is puppet01, /git/puppet clone access? 20:42:22 I don't know how the acls are setup -- they won't get access to the private repo, correct? 20:42:38 ok so very late to the party am I 20:42:55 CodeBlock: ssh access to hosts 20:42:57 abadger1999: no 20:43:01 abadger1999: sysadmin-main only 20:43:05 Excellent 20:43:16 skvidal: which hosts? 20:43:21 skvidal, ssh access to *all* hosts? 20:43:29 not _quite_ all 20:43:41 I think ricky had a reasonable point 20:43:41 basically -noc + ro puppet? 20:43:53 and maybe not giving access to the xen/virthost boxes 20:43:58 but the rest is okay 20:44:13 anyone think that sounds bad? 20:44:15 skvidal: so basically.. -noc + ro puppet. :P 20:44:26 CodeBlock: fine, be that way! :) 20:44:39 ;) 20:44:50 skvidal, I would not give sudo right away to -noc and puppet01 20:45:01 skvidal: id say no virt hosts, no builders but otherwise ok 20:45:02 skvidal, if you meant that with 'access' 20:45:24 actually that sounds pretty much sysadmin-noc 20:45:30 dgilmore: a fair point - I agree about keeping people out of releng 20:45:33 dgilmore, skvidal: fasXX, dbXX? 20:45:37 signXX? 20:45:41 other high security boxes? 20:45:42 CodeBlock: sign == releng 20:45:44 cannot happen 20:45:54 smooge: here's my problem with sysadmin-noic 20:46:02 no I am slow typing 20:46:05 I see the diff 20:46:07 1. that gives them access to modify the noc systems and that's just an issue 20:46:22 2. I hate the idea that the first way in is to work on nagios - that just seems odd to me 20:47:01 that's really all 20:47:06 we are looking at a subset of systems we feel are ok for starters: publictest, people, hosted?, collab?, smtp-mm? 20:47:17 bastion and puppet 20:47:19 CodeBlock: db02 (whichever db server has fas atm), If we're not keeping people off the builders/kojihub then I don't think db03 is a problem, db01 is probable not an issue if we give people access to the app servers. 20:47:26 sounds right to me 20:47:56 we will be keeping people off of the builders/releng/fas/db/sign 20:48:16 okay, then db03 can be included... I think db01 would still be okay. 20:48:21 okay 20:48:28 but equally, no need for me to bikeshed :-) 20:48:36 smooge: which of those listed with they have sudo on? pt? what else 20:48:52 pt would be it 20:48:54 ok 20:49:07 sudo on? 20:49:09 if that. 20:49:09 s/included/included in the keeping people off list/ 20:49:11 why would they have sudo on them? 20:49:12 woah 20:49:17 the whole point of this is READ ONLY 20:49:21 sudo != READ ONLY 20:49:25 skvidal: +1 20:49:32 ok, ok 20:50:27 a couple of things for people to know. we have had a pretty small set of people on puppet. this has meant people have had poor permissions on various directories 20:50:34 * dgilmore is with skvidal nno sudo for you 20:51:03 * phuzion is here for the last couple of minutes 20:51:11 smooge: 'poor permissions'? - what does that mean? 20:51:18 smooge: we have some unprotected things? 20:51:21 * ricky has noticed over the months world-readable private clones every once in a while 20:51:22 o+r on private 20:51:27 I check and chmod/notify people every once in a while 20:51:41 I also made sure that precautions for private repo are in all SOPs that tell you to clone it. 20:51:52 can we make the o+r a git hook? 20:52:06 It wouldn't really work as a hook - it needs to be on clone 20:52:21 I looked at whether git did any umask stuff a while back and found nothing, but it could have changed now. 20:52:23 and that is usually via the umask a person has 20:53:37 okay 20:53:45 then a couple of options 20:54:07 1. no access to puppet - just cloneable access to the public git tree 20:54:12 We have been really good about sharing with go+r but that may not be a good idea with a larger tree 20:54:17 apparently my weechat session felt like dying. 20:54:20 2. we cron job the private repo with a mallet 20:54:54 Apparently cron job is now a verb. 20:54:55 well its not the /srv/git/private as much as ~smooge/private 20:55:06 smooge: like I said - with a mallet 20:55:20 now my private hurts 20:55:30 smooge: that sounds like a personal problem. :( 20:55:43 ok anyway... 20:56:00 our time is coming to a close 20:56:05 we also need to go through puppet and make sure its clean 20:56:07 thanks for dying, weechat. You're awesome. I woke up this morning thinking "You know. I hope weechat dies today." 20:56:21 CodeBlock_, can we end up with my two items please? 20:56:23 we have not published it before because it hasn't been 'checked' 20:56:36 we are down to 4 minutes 20:57:00 averi: have smooge #topic seeing as I'm currently fighting with weechat and can't anymore. 20:57:02 so ricky I would like you and some others to go over /puppet/ and see what we can do 20:57:19 averi, what are your topics and can they be dealt with in 2 minutes? 20:57:35 * skvidal is tired of this 20:57:37 before we go on 20:57:45 I'd like to suggest we stop having our meetings here 20:57:45 smooge, mostly what we decided to do with blogs and inactive lists on hosted 20:57:48 this is stupid 20:57:53 and I'm tired of being kicked out of this channel 20:58:00 and having to mess up our whole conversation/logging flow 20:58:06 I agree 20:58:08 skvidal: +1 20:58:12 20:58:16 I agree and I'm in the following meeting! 20:58:16 I'm going to look for another room 20:58:18 that's available 20:58:26 We need to allocate 1.5-2 hours. 20:58:29 yes 20:58:29 we do 20:58:38 yeah 20:58:52 I could ask about moving the cloud meeting to an earlier time if you want. 20:58:58 we're just not as efficient as the late mmcgrath :-) 20:59:04 2 hour meetings - fun. 20:59:06 skvidal, #fedora-csi i sopen 20:59:09 I'm fine with #fedora-admin if you all are 20:59:21 Not like much goes on in there during meetings anyway. 20:59:21 * sijis is ok with that 20:59:26 ricky: +1 20:59:31 ricky: +1 20:59:34 except when we get someone with random drivebys 20:59:40 and then we can't stay on task 20:59:40 Will there still be logs? 20:59:43 or log the channel effectively 20:59:53 gholms: There will zodbot's there too. 21:00:00 I am moving to #fedora-csi 21:00:09 #endmeeting