19:00:01 #startmeeting Infrastructure (2011-07-21) 19:00:01 Meeting started Thu Jul 21 19:00:01 2011 UTC. The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:01 Useful Commands: #action #agreed #halp #info #idea #link #topic. 19:00:01 #meetingname infrastructure 19:00:01 The meeting name has been set to 'infrastructure' 19:00:01 #topic Robot Roll Call 19:00:01 #chair smooge skvidal codeblock ricky nirik abadger1999 19:00:01 Current chairs: abadger1999 codeblock nirik ricky skvidal smooge 19:00:05 * skvidal is here 19:00:07 * CodeBlock waves 19:00:14 Here 19:00:44 * athmane here 19:00:49 * nirik waits a minute more for folks. 19:01:50 #topic New folks introductions and apprentice tasks/feedback 19:02:06 Any new folks like to say hi? or apprentices have questions/comments/tasks to talk about? 19:03:04 * nirik listens to the sound of silence. ;) 19:03:16 good song. :P 19:03:24 ok, do feel free to chime in on list or in our regular channels anytime... 19:03:30 #topic Phoenix on-site work recap/summary 19:03:51 So, smooge and I were out at phx the other week. Just thought I would summarize what all we did... 19:04:05 hola 19:04:20 We setup a new backup box and tape drive. We will be transitioning to this one from our existing one in the coming weeks/months. 19:04:46 We pulled old machines out and sent some to the great place in the sky for old hardware. 19:05:37 we sent the machines to colorado? 19:05:47 We took 5 of the newest/most useful looking boxes and stuck them in a new rack as 'junk01' to 'junk05'. We were thinking we might be able to use these as a testbed for things that we want to see if might someday work for us. 19:05:52 ha. ;) 19:06:11 We moved all the qa machines to a qa rack and network 19:06:32 we took inventory/tried to make sure all management/serial/power was setup and known 19:06:55 we were going to rack new machines, but they didn't arrive in time... so they will be added as we can get someone there to rack them for us. 19:07:20 smooge: can you think of anything I left out? I probibly did miss some things... 19:07:34 you broke the backups? 19:07:35 I got sick 19:07:42 oh - you meant things you did intentionally... 19:08:12 yeah, I have no idea how backups broke. ;( We must have nudged the tape drive some how... it was in a weird state. 19:08:22 yeah.. 19:08:25 but that should be fixed now. 19:08:34 we learned that various boxes have only one power supply 19:08:40 even if they have multiple plugs 19:08:51 oh, I have some pics, need to see if they came out at all. 19:09:04 nirik: hmm - can we put them into the infra-hosts repo? :) 19:09:15 my 'love' of ppc found new depths 19:09:38 sure, if they are usable. ;) It wasn't easy getting far enough away to get anything... so they might be out of focus, etc. 19:09:43 I wish I had another week out there so I could get the rest of the hardware 19:10:18 I think it was a very productive trip... we got a lot done/cleaned up/etc 19:11:09 which brings us to the next topic... 19:11:17 #topic QA network setup brainstorming 19:12:12 so, we have 2 racks in a qa network... this includes some qa test boxes, a virtual host that has autoqa01 and autoqa01.stg and bastion-comm01 on it along with the junk boxes and the secondary arch stuff 19:12:36 qa folks have expressed interest in monitoring and puppet or puppet like setup. 19:12:57 how seperate do we want the qa setup from our main setup? 19:13:17 so we separate monitoring too ? 19:13:45 * skvidal thinks separate monitoring is overkill. having said that we have a lot of legacy in our existing nagios layout 19:13:47 athmane: thats a question, yeah. It seems a pain to have more nagios to me... but it would allow them to monitor their own stuff without ours 19:14:25 I think we could get them to possibly use bcfg2 there, as a testbed. They have many fewer machines than we do. 19:14:38 I'm not sure how usefull bastion-comm01 is. 19:15:12 I guess we wanted seperate from our bastion for access there. 19:16:42 hmm 19:16:45 anyone have thoughts or ideas? I guess I am leaning toward no seperate monitoring, stick bcfg2 on bastion-comm01 and make it the config host there... then move bastion-comm01 and virthost-comm01 out of our puppet. 19:17:03 i think we do one of 2 things 19:17:21 fully integrate it wheich means move it back to a fedora network 19:17:25 or fully seperate 19:17:34 yes, I don't see the need to separate monitoring, so I agree with skvidal 19:18:04 * nirik nods. we could easily just add their hosts to our puppet, but then qa folks would need access to our puppet. ;) (which I don't really know if it's a big issue or not) 19:18:18 its not until it is 19:18:30 separate config is good for security imho (qa net is more like a lab) 19:19:40 Did we ever update to the new version of nagios? 19:19:44 We went the seperate fact since the boxes there can run stuff 19:19:45 abadger1999: yep. 19:19:48 k 19:20:06 abadger1999: we are on nagios3 now 19:21:02 ok, I'll gather more info and talk to qa folks and set it up one way or the other. If anyone has strong ideas on it, let me know soon... 19:21:19 Proposal sounds good... the only question I have is how long well run both bcfg2 and puppet 19:21:46 abadger1999: well, it would depend on how well it works out there... and then if it did we would need some kind of transition plan. 19:21:53 Yeah. 19:21:54 abadger1999: and if we like it at all 19:22:26 I think this is a nice small group to test with... 19:22:29 Would we want a transition plan either way? If we do like it migrate fi-main to bcfg2, if we don't like it migrate fi-qa to puppet? 19:22:41 8 qa machines, 2 autoqa instances, a bastion and a virthost. 19:23:16 abadger1999: yeah. either migrate back to puppet there, or fold them into our puppet. 19:23:27 but if it's already seperate, probibly just migrate them back to their own. 19:23:35 Sounds like a plan. 19:24:09 #action nirik to continue talks with qa and move stuff 19:24:21 anything more on this? 19:24:44 #topic Outstanding RFR (Request for Resources) 19:24:53 I noticed we have a number of RFR's open. 19:24:58 oi 19:25:01 I added a list to the agenda email 19:25:20 many of them are old or in an unknown to me state. ;) 19:25:35 close 19:26:05 #1591 is two years old 19:26:08 yeah, if anyone wants to update them, please do. Otherwise I will look at closing... 19:26:09 and fpaste.org is running 19:26:23 pingou: yeah, but that one it turns out is active. ;) 19:26:30 herlo is going to be updating it. 19:26:50 I saw something about it but didn't get the issue 19:26:56 the fpaste.org folks are tired of running it, and want us to. 19:27:10 do we ? 19:27:18 it's been finally packages up. 19:27:20 * StylusEater is late 19:27:27 packaged. 19:27:30 * nirik can't type 19:27:48 Have we taken over fpaste? 19:27:54 nope 19:28:10 not yet... it's unclear to me the status of the domain... 19:28:45 * pingou wonders what make them tired (eg that wouldn't make us tired) 19:28:48 askbot is also active recently. Others I am not too clear on. 19:28:59 maximum 24hour lifecycle. Is that enough ? 19:29:33 pingou: spam, dealing with upkeep, paying for the instance that runs it, etc 19:29:41 ciphernaut: for what? 19:30:17 nirik, for anything/everthing. 19:30:42 ! 19:30:59 Related to RFR's: I am going to try and whip up a SOP page on process around them... 19:31:04 ciphernaut, if you are referring to fpaste we have found that works very well 19:31:23 its a pastebin not permanet hosting 19:31:44 most pastebins I've dealt with have 1 month or forever.. though if thats the majority of required usage cool 19:31:58 well, thats all details we can tune later right? 19:32:08 yep 19:32:13 true 19:33:14 Of the items what should be at PHX2 and what outside (and where) 19:33:19 for ask and paste I would like to try a new process: applicationname01.dev -> applicationname01.stg -> application01 (production). Create the group from the start that will work on it, etc. 19:33:57 smooge: yeah. That should be determined at least at the stg point in the process... should it be load balanced/cached or not. 19:35:00 I think both ask and paste are good to be nice and seperate as we can easily make them... ie, their own instance/db. I don't know how well clustering/replication will work for them, thats something we also need to find out. 19:36:30 anyhow, will try and update the RFR page and make a SOP and send to list for more comment. 19:36:51 so we can try and have a process for these. 19:37:27 anything else on RFR's? any others folks want to save/comment on? 19:37:31 yeah I like that process 19:37:40 nitrate sounds like another one for that 19:37:58 yeah, it's stalled in review... so not sure whats going to happen there. 19:38:04 nitrate is not yet packaged 19:38:42 perhaps step 0 of the rfr process should be: "get it packaged, then come back here" to avoid filing RFR's too far in advance. 19:39:11 hehehe that would make a lot of stuff easier on us. 19:39:17 we (qa team) still use wiki for record test results but i heard about a pilot project to use nitrate 19:39:34 nitrate looks cool from a quick glance... 19:39:49 * athmane forgets if for f15 or f16 19:40:02 anyhow, if nothing else on this, moving on... 19:40:17 #topic Hotfixes 19:40:28 I think dmalcolm and dgilmore were working on the python buildbot one. 19:40:32 So, we also have a pile of hotfixes built up over the last while. 19:40:58 abadger1999: ok, will ping them for status. ;) 19:41:17 https://fedorahosted.org/fedora-infrastructure/query?status=new&status=assigned&status=reopened&group=component&summary=~hotfix&order=priority 19:41:38 * nirik waits while hosted dies because we all clicked. 19:41:49 dead dead dead 19:41:56 nirik: :) 19:42:00 we need our own hosted just for us 19:42:12 :S 19:42:17 abadger1999: i need to work on that 19:42:29 it was part of why we moved some builders tp be virtual hosts 19:42:56 Yeah, I did some testing/pingdom loading on fedorahosted's gitweb index, and it took over 16 seconds to load on average. That's just bad :( 19:43:13 anyhow, what are the chances we could get a pkgdb release update, supybot-koji, pinglists, mediawiki-116? ;) 19:44:00 dgilmore: huh... which ones are virtual? 19:44:44 I will be working on mediawiki-116/117 19:45:04 xb01 I thought was the virtual builder 19:46:09 smooge: not sure if the mediawiki bug was filed upstream yet... I guess ping ricky on it. 19:46:23 ok will do so 19:46:38 oooh long trac traceback 19:47:21 nirik: pkgdb update is something I want to do in the next month. It's probably third on my list of "not-a-fire" tasks, though. 19:47:35 abadger1999: ok, cool. Would close a number of hotfixes. ;) 19:47:45 Seems that I always run into one freeze or another before getting them out :-) 19:48:09 yeah. 19:48:19 nirik: hehe. The only thing is a lot of hotfixes go in just after a release due to finding new bugs in the code :-) 19:48:29 yep. it's a never ending cycle. ;) 19:48:48 which brings us to the next topic: 19:48:54 #topic Upcoming Tasks/Items 19:49:34 basically the only things on my list are the freezes for right now: 19:49:37 2011-08-01 mail fi-apprentice folks. 19:49:37 2011-08-02 - 16: Alpha change freeze 19:49:37 2011-08-09 Remove inactive fi-apprentice people. 19:49:37 2011-08-16: Fedora 16 alpha 19:49:37 2011-09-06 - 20: Beta change freeze 19:49:38 2011-09-20: Fedora 16 Beta 19:49:40 2011-10-11 - 25: Final change freeze 19:49:43 2011-10-25: Fedora 16 release. 19:49:53 so, we have until the 2rd before our first freeze. 19:50:10 If anyone wants to work on/schedule things, please let me know. 19:50:33 more moving things to rhel6. 19:50:35 We should get the change freezes into the infra calendar 19:50:49 yeah, keep meaning to, then getting distracted. ;( 19:50:59 nirik: app servers, proxies, hosted... what else is on the migrate to rhel6 thing? 19:51:02 anyone interested in updating the calendars? :) 19:51:39 skvidal: last I looked we were just over 50% rhel6, so all the rest. 19:51:44 nirik: :) 19:51:47 smartass 19:51:58 I think ibiblio01 we can move over once we have a ibiblio02 we can migrate things to 19:52:06 infra calender? 19:52:16 tummy01 might be a good one to remote re-install. 19:52:30 value's might not be hard to migrate over. 19:52:48 looks like 64 hosts on 5server 19:53:11 once we have new machines racked up in phx2, we can move more things there. 19:53:13 http://fpaste.org/HTSO/ 19:53:43 smooge: http://kevin.fedorapeople.org/infrastructure-*.ics 19:53:58 smooge: they are in the git infra repo too. 19:54:07 serverbeach1 should be doable and would be an interesting case to find out if the serverbeach boxes will be able to survive el6 19:54:35 skvidal: yeah. I was meaning to talk to them about a hardware refresh at the same time, but didn't get to that either. ;) 19:55:24 for many of the rhel5 instances, we need to move their host to rhel6/kvm before moving them. 19:55:28 nod 19:55:38 * skvidal grimaces at torrent 19:55:44 hmm 19:55:50 cnode01... 19:55:56 and dhcp02.c 19:55:59 not our problem soon 19:56:23 tummy01 and bodhost might be good ones to re-install as they don't have any critical stuff on them I don't think. we could even leave the guests lvm alone and bring them back up after the re-install 19:56:36 nod 19:57:07 also what are we using serverbeach1 for? 19:57:24 bxen03 only has releng01 on it... once we get a new machine racked in phx2 in the build rack I can move that over and we can reinstall bxen03 19:57:28 smooge: a mirror istr 19:57:34 smooge: another download mirror I think is all. 19:57:40 thats not phx2. 19:57:56 sb1 has had a host of issues trying to make it be a virthost 19:58:24 another possibly good reason asking about a hw refresh. ;) 19:58:33 nod 19:58:45 #topic Open Floor 19:58:59 running low on time, any other plans/ideas/dreams? 19:59:24 dreams 19:59:25 yes 19:59:36 anyone here looked at salt? http://saltstack.org/ 19:59:48 I've been playing with it a bit and looking over the features in it 19:59:51 I glanced at it the other day... first I had heard of it. 20:00:00 it's more or less func + zeromq for the communication layer 20:00:07 fairly fascinating, actually. 20:00:08 all in python 20:00:23 and the devs definitely have a use case like we have in mind 20:01:06 cool. So it means clients listen to a bus for actions? 20:01:16 more or less, yes. 20:01:29 it means the clients don't need a port open 20:01:31 heheh I have that calender in my system already. I will update the calenders this week 20:01:32 like we have right now wit hfunc 20:01:41 so it means one more port closed off 20:01:45 what is zeromq? 20:01:45 cool. 20:01:49 which is good 20:01:58 smooge: google is your friend :) 20:02:03 * nirik remembers lots of talk about message bussing a few years ago, but it never seems to have taken off. 20:02:20 there are a couple of things here that are interesting to me. 20:02:35 amcq or something :) 20:02:57 1. whether or not this adequately covers the functionality of what func has been providing for us? 20:03:18 2. I'm looking at if I can port functionality like func-yum to it and have it all work the same (which would be amusing) 20:03:45 cool. 20:03:46 3. one of the things we wanted out of qpid/amqp is notifications/events as well. - the question is if we can get there from here with zeromq 20:03:57 * nirik nods. That was my next question. 20:04:09 smooge: It's a library that implements easy to program buffered network sockets.. depending on who you ask, it's a lightweight message bus or nearly everything you need to make a message bus. 20:04:48 nirik: much to wonder and play with... 20:04:59 anywya - just wanted to ask if anyone here already had experience 20:05:01 yeah, keeps things fun/interesting. ;) 20:05:02 want to use a couple of cloud instance to do so? 20:05:18 I read about it yesterday.. interesting to see if its packaged etc? 20:05:22 smooge: right now I'm just dinking with it on guests on my laptop 20:05:36 smooge: there are pkgs - not in fedora - b/c of our zeromq ver 20:05:50 the authors of salt appear to be rpm-friendly people, though 20:06:20 salt looks interesting... do you see that potentially obsoleting func? 20:06:46 * nirik notes we are over time, but I don't think anyone else is scheduled to meet, so we should be able to just keep going. ;) 20:06:55 lmacken: it has a lot of the same functionality 20:07:04 lmacken: and it offers a very similar plugin infrastructure 20:07:21 lmacken: I talked to the lead dev - the reason it is similar is b/c he had investigated func before working on salt 20:07:29 lmacken: it's not accidental. 20:08:07 lmacken: he doesn't have the same modules but a goodly number of func's modules are bound up with some xmlrpc-isms. 20:08:46 and the connect-out-only is useful for us. 20:08:54 which is the main thing pulling me at it 20:09:07 also that it doesn't require us to tie up qpidd on a systems-mgmt tool is nice 20:09:16 yeah, true 20:09:16 so qpidd can be used for other apps that need it w/o any conflict 20:09:44 speaking of our message bus vision, hopefully we'll pickup some momentum on that in the near future 20:10:13 lmacken: cool. 20:10:41 ok, anything else, or shall we call it a meeting? 20:11:28 that's all I have 20:12:07 cool. Thanks everyone. Lets get back to #fedora-admin and #fedora-noc. ;) Thanks for coming everyone... 20:12:10 #endmeeting