20:01:24 #startmeeting infrastructure 20:01:24 Meeting started Thu Jan 13 20:01:24 2011 UTC. The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:01:24 Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:01:33 #meetingname infrastructure 20:01:33 The meeting name has been set to 'infrastructure' 20:01:42 #chairs ricky skvidal 20:01:51 #chair ricky skvidal 20:01:51 Current chairs: ricky skvidal smooge 20:02:00 #topic Robot Roll Call 20:02:19 * nirik is lurking around. 20:02:29 tom servo! 20:02:40 crowbot 20:03:14 * dgilmore is kinda here 20:03:19 skvidal is kinda here 20:03:27 * skvidal is here 20:03:28 sorry 20:03:32 I was not in the channel 20:03:39 6am is before propper wakeup time 20:03:46 dgilmore, ugh sorry. 20:03:47 * abadger1999 waves 20:03:56 smooge: ehh 20:04:05 * gomix waves hi... probably lurk... 20:04:09 #topic this weeks business 20:04:17 * fchiulli sits in the rafters 20:04:41 Ok we(skvidal) did updates to all the servers and we spent tuesday rebooting systems 20:04:53 skvidal anything you wanted to add? 20:04:57 yay for reboots 20:05:04 and oh look - we get start again 20:05:15 * nirik wonders about 5.6 plans. ;) 20:05:25 skvidal: new kernel update? 20:05:36 dgilmore: rhel 5.6 , yah 20:05:49 but the kernel update doesn't have much in the way of anything compelling or pressing afaict 20:06:22 ok 20:06:50 unless someone has put /dev/ecryptfs as 755 or something 20:06:58 do we have a list of rhel5 machines we want to move to rhel6? 20:07:31 not yet. I had been putting that off til we got a replacement mmcgrath 20:07:43 dgilmore: umm, all of them? 20:07:48 dgilmore: :) 20:07:55 skvidal: well yeah. 20:07:56 but since it may be a while... we can go to next stage of ordering them 20:08:17 #topic EL5 -> EL6 20:09:20 ok first item so I can say I read f-a-b and lwn.net. We have no plans for go from EL5 -> FX for infrastructure. I don't think it is feasible for our kind of development 20:09:30 +1 20:09:37 I don't think that's to be contested 20:09:47 and anyone who wants to put up a fight about it is going to find themselves in trouble 20:09:52 b/c no one with root access is going to do that 20:10:00 and that's really that, afaict 20:10:19 wow you said it in a much nicer and cleaner way than I was going to. 20:10:27 20:10:28 * nirik finds this a non starter, lets move on. ;) 20:11:03 +1 20:11:06 move on 20:11:12 ok next item. server list for moving. 20:11:31 * skvidal generates an el5 list 20:11:35 Do we want to look at doing it bottom up or top down 20:11:59 and do we want to take up the meeting with this? [I just put it in topic as it was what was started.] 20:12:10 Oh quick change while skvidal does his thing 20:12:33 last week we did an out of band meeting in #fedora-meeting-3 20:12:44 and triaged all the tickets that were listed as meeting. 20:12:59 I think we ended up with a good list of what we can do quickly and what not. 20:13:12 I just need to find those notes and post them for communal memory sake. 20:13:29 #action smooge will post logs from last weeks meeting and this weeks. 20:13:29 did anyone update the indicated tickets? 20:13:33 that might be good to do also. 20:13:38 el5 boxes 20:13:39 http://fpaste.org/yGR7/ 20:13:42 according to func 20:14:13 nirik, once I find the notes I will do so. I got caught up in somehting else and forgot. 20:14:19 #topic EL5 -> EL6 20:14:45 I would say we would want to do the following: 20:15:22 1) Rebuild staging to EL6 say db/proxy servers first, and then work our way to the middle. app servers and such. 20:15:47 The reason is that app servers are probably going to take the longest 20:16:02 question -- do all our backup solutions work with el6? 20:16:28 as I don't know that we test that aspect in stg 20:16:30 mmcgrath, created a old bacula rpm for us to use on the servers 20:16:36 k 20:16:45 the drbackup will need to be checked 20:16:56 I need to fully grok it anyway. 20:18:00 question -- do we need to consider the hosts that are running the virtual machines when figuring out order? 20:18:02 smooge: FWIW, drbackup is pretty simple. just rsync and some filesystem acl's 20:18:06 my second idea was that we have a bunch of xen boxes going EOL early next year. I was figuring we would just go with "shutdown and replace with kvm" versus worrying too much 20:18:38 mmcgrath, I figured it was.. but there is that and then there is the "oh thats how it really works.. wow I should have realized that 2 months ago" 20:18:43 smooge: shutdown and replace seems like a reasonable move to me 20:18:57 * skvidal makes a note to bring up something in #open floor 20:19:04 I have one more box to get a quote and order next week 20:19:16 id like to move bacula to EL-6s versions 20:19:19 that should allow us to bubble sort stuff 20:19:38 dgilmore, I would too. However.. I am not sure how to do it without losing old backups. 20:19:54 smooge: we can work that out 20:19:55 dgilmore I wanted to talk to you about it when you got back or at Fudcon 20:20:04 so the new bacula is completelyt backward incompat with the old bacula 20:20:09 we should talk about it at fudcon 20:20:10 it can't even READ the old backups? 20:20:13 dgilmore, we are about ready to order another tape server. 20:20:25 skvidal: afaik it should be able to read the backups just fine 20:20:45 skvidal: the issues is the on the wire protocols changed 20:20:46 skvidal, I am not sure. We are multiple major versions behind what is in EL6 20:20:56 so the new client cant talk to the old server 20:21:03 and the database layout changed also 20:21:08 okay 20:22:17 what I had read so far was we could move from 2->3..->5 but not tested for 2->5 (I think those are the versions looked at). 20:22:31 anyway.. fudcon 20:23:05 so anything else people want to discuss on 5->6? 20:23:17 * ricky is here 20:23:29 hey ricky 20:25:00 #topic new netapp 20:25:15 ok we are going to have a new netapp for PHX2 systems coming up RSN 20:25:19 I did a bacula 2->3 conversion at home without troubles for what it's worth - restores were fine 20:25:46 the hardware is in place and RHIT is doing a lot of testing to make sure its working 20:26:01 new system will be all Fibre Channel.. 20:26:29 we will be looking at a series of outages either before Fudcon or after fudcon depending on various testing. 20:26:58 most outages should be on the order of 4 hours as data from netappA -> netappB 20:27:13 is done one final time. 20:27:31 however times/dates will be finalized early next week. 20:27:42 questions I need to know is any preference from us on when/where? 20:28:44 skvidal, did I forget anything? 20:28:51 nope 20:29:06 ok in that case.. call for topics before #open? 20:29:38 * skvidal has 3 items 20:29:40 for open floor 20:29:44 #topic Open Floor 20:29:49 floor goes to skvidal 20:30:00 okay 20:30:02 item 1 20:30:12 disabling fsck on-mount-time - anyone opposed to this? 20:30:31 What are the pros/cons? Any affect safety-wise? 20:30:37 this is just disabling the "it has been 180 days" crap 20:30:47 ricky: unless disks are damaged it won't turn up anything 20:30:55 and if disks are damaged it won't do much good 20:31:03 it's been disabled in fedora for a while now 20:31:18 if the disk is not umounted cleanly it will always run 20:31:34 we're only turning off the 'it has been X days since an fsck, check forced' part of things 20:31:39 Ah, OK. I'm pretty satisfied if it was decided to be OK for Fedora :-) 20:32:01 anyone else have another opinion? 20:32:07 I say go ahead 20:32:26 func-command go 20:32:33 there's the time and the number of mounts... 20:32:35 hah - I dunno if I'll do it with func or not 20:32:41 both are fine to be 0 IMHO 20:32:47 nirik: I doubt we'll ever hit the number of mounts :) 20:32:52 nirik: but I agree -in either case 20:32:59 it's happened to me before. ;) 20:33:05 just need an instance that crashes a lot 20:33:17 releng! 20:33:31 or fas01 20:33:50 :) 20:33:58 sounds like no objections 20:33:59 next item 20:34:14 who is going to be at fudcon in tempe,az? 20:34:41 * ricky 20:34:41 don't everyone answer at once 20:34:46 :) 20:34:55 me 20:34:57 skvidal: im ok with disabling the auto every 180 days fsck 20:34:59 * ianweller 20:35:08 skvidal: ill be at fudcon 20:35:25 okay 20:35:25 I will be 20:35:31 so we'll have a considerable number of folks 20:35:33 that's good 20:35:37 I will be split between it and the colo 20:35:38 i will 20:35:45 smooge: Really? Aw :-( 20:36:02 it might be worth discussing this next item at fudcon 20:36:26 smooge: and at fudcon you'll be split between infra and the board... 20:36:28 but I'd like to get a decision on whether or not using machines in rackspace or amazon's cloud is acceptable for us 20:37:13 neither of the services are 'open source' - but then neither is serverbeach nor telia nor internetx 20:37:49 * nirik should be there. 20:37:50 * smooge would mention that we should only be building on pure opensource MIPS systems but someone might take him serious 20:37:51 rbergeron: By any chance, do you know if anyone from rackspace will be at fudcon? 20:37:52 I guess it depends on how we use the service - like will we end up writing a lot of code that ties into jut their API? 20:38:21 And is that something we want to put time into doing if that talks to a closed source service 20:38:32 my main issue is that we need to budget for it 20:38:34 ricky: a closed-source service? 20:38:45 abadger1999: rackerhacker, who is the guy who puts up the fedora images into their various hosting stuff, is coming. 20:38:53 He's a fedora lovah. 20:39:02 rbergeron: glad he's coming - it'll be nice to meet him 20:39:13 smooge: there is a fedora-mips we could run on it 20:39:16 As in the thing that the cloud APIs talk to (mostly guessing here, no real experience with any of this) 20:39:29 ricky: the modules they use are open source 20:39:35 ricky: and many are in fedora now - python-boto 20:39:55 abadger1999: also, my understanding had been that osmeone from the openstack side of things will also be coming, but the person who it would have been (rick clark, aka dendrobates) can't make it, but he was trying to find someone appropriate to come in his place. 20:39:59 ricky: the eucatools are another set of open source client ends 20:39:59 No objections from me of those are open source/not locked into any particular vendor 20:40:00 abadger1999: why? 20:40:22 rbergeron: to talk to them :) 20:40:30 ohhhhh. I see. 20:40:48 ricky: there not open source, and are locked to the vendor 20:40:55 I know rackerhacker is coming. :) 20:40:56 Or to IRC to them while sitting across from them :-) 20:40:57 ricky: the client tools are open source 20:40:59 ricky: the backends are not 20:41:08 though its supposed to be easy to move your data to a different provider 20:41:14 rbergeron: Very cool. Let us know if we can make influence someone coming by telling them what we want to discuss :-) 20:41:19 dgilmore: well in our case it would be 'reinstall elsewhere' 20:41:25 I'm mostly not worried about the client tools, actually 20:41:40 I'm more thinking that the hosting costs could be considerably cheaper 20:41:46 this is for the build shit fast and far? 20:41:48 and we don't have to play the "well do we have space in the ack" game 20:42:04 smooge: build shit fast, far and also to deploy publictest/dev boxes quickly and w/o all the bullshit 20:42:23 skvidal: right 20:42:42 so - let's say we treat both rackspace and ec2 just like we do serverbeach 20:42:57 but instead of having to call and talk a new site out of them 20:42:59 we just spin one up 20:43:00 boom 20:43:02 im personally against having the builders on other providers systems 20:43:13 dgilmore: okay, why? 20:43:19 but let's not just think about builders 20:43:24 let's also think about publictest boxes 20:43:39 and even additional on-demand app and proxy## servers 20:44:51 skvidal: a few reasons, but the biggest being that we cant be sure that a build has not been tampered with. some vunerabily or someone in the providers hosting could effect a build. inject something we dont want 20:45:15 dgilmore: so two things to speak to that 20:45:21 1. we don't have to be talking about official builds 20:45:24 2. we can't be sure of the above now 20:45:31 skvidal: but i do have some ideas for how we could better utilise spare capacity on the builders 20:45:46 Oh, so do I. 20:46:23 okay - so the answer to my question seems to be that if we use the cloud providers as we use the other hosting providers - that no one objects to using them 20:46:27 is that roughly true? 20:46:39 skvidal: thats roughly true 20:46:39 ehh... 20:46:55 abadger1999: ? 20:47:16 If there's a difference in the open-sourceness of the two I'd rather go with the one that's more open source. 20:47:40 abadger1999: well both are just as open source as serverbeach 20:47:42 But the cost compared to doing all of our own work is very compelling. 20:47:42 or internetx 20:47:51 when it comes to their infrastructure 20:48:09 hell and considering we use cyclades and friends inside phx - they're just as open as we are. 20:49:17 * gholms notes that eucalyptus is compatible with ec2 :) 20:49:38 gholms: and if you can find a provider using only eucalyptus I'm all ears 20:49:53 just wants to know how to pay the bill without using his credit card. after that I don't care. [I am so much a candidate for management now :)] 20:50:03 Ah, the aim is to run it on someone else's hardware. Got it. 20:50:07 smooge: you pay using max's creditcard :) 20:50:17 smooge: fedora already has an account 20:50:19 20:50:23 we just need to get it sent to the right internal cost center 20:50:46 hw maintenance is expensive and time consuming 20:50:56 and from a 'benefit to fedora' standpoint it doesn't buy us mich 20:50:57 err much 20:51:00 I guess -- if we could use our choosing of a provider to help encourage more open sourceness then that seems like a good thing. 20:51:21 I can ask some people at eucalyptus if they know of a hosting provider that uses it. 20:51:40 gholms: I suspect their answer will be "well, not ALL of it" 20:51:42 or "yes, but" 20:52:00 one thing the reboot-cycle taught me this week is this 20:52:04 our shit is fragile in lots of place 20:52:07 Probably. I'll just walk across the hall and ask... 20:52:16 but the most noticeable place is the db's 20:52:42 we need to to spread that out and even if performance suffers - make sure things continue working w/o the db's in placew 20:53:33 gholms: have i mentioned that i love your new job? ;) 20:53:33 and I think at this point we don't need to be worrying about hw or disks falling out - we need to be focusing on services that help fedora 20:53:49 b/c I can assure you that knowing abour rsaII mgmt on ibm boxes is NOT useful or helpful to fedora 20:53:54 I think that's solvable with replication - file storage is another tough question though :-/ 20:54:18 ricky: gluster might be a nice option - and I was actually considering testing it out - I just needed a few boxes to do that with 20:54:37 ricky: so my first thought was - deploy 5 boxes at ec2 - and set them up in the same region and try out gluster 20:54:53 ricky: I played with a lot of stuff last month on my own dime and it cost me a grand total of $2.57 20:54:55 I'm okay w/that 20:55:05 :-) 20:55:07 I think if we want to encourage people to play and work on projects 20:55:11 that we offer that to them 20:55:13 but on fedora's dime 20:55:50 okay - that's all the thoughts I had 20:55:57 ok next one? 20:56:00 Out of curiosity, are they donating, or is Fedora just paying? 20:56:06 ricky: fedora's just paying 20:56:07 Fedora would be paying 20:56:09 Or is that to be seen 20:56:09 OK. 20:56:14 * mmcgrath has never known amazon to donate anything on ec2 20:56:16 ricky: we pay (a lot) for hw right now 20:56:27 True 20:56:36 well actually RHIT pays for the hw, fedora's cost accounts dont 20:56:38 ricky: did I mention it's a lot 20:56:39 b/c it's a lot 20:56:52 smooge: while you're working on next years budget, you might want to see if we can get some EC2 in there. 20:56:52 so we need to work out that side of things :). 20:56:53 smooge: there is a lot of money spent on hw/hosting 20:57:11 mmcgrath: I did the math - for all the time we built things in the last year 20:57:13 for example 20:57:21 I can ask around at eucalyptus to see if they know of any hosting providers that use our stuff if you would like. 20:57:24 the cost for getting ec2 systems to do those builds comes to 12K 20:57:32 s/they know/anyone knows/ 20:57:36 skvidal, correct. most of that money gets spent by RHIT whehter we put boxes in there or not.. thats where finding funds gets fun :) 20:57:38 we can buy, maybe, 2 or 3 systems for that 20:58:11 (Right now most of the people I would ask are at lunch) 20:58:12 I am not against it.. and will put it in there 20:58:14 anyway. 20:58:24 I'm mostly thinking that if someone comes to fi 20:58:26 with an rfr 20:58:29 2 minutes til rbergogre comes and kicks us off her bridge 20:58:33 it'd be nice to spin up a host instaneously 20:58:53 and not have to think about 'do I have the hw for this' 20:58:56 mmcgrath: AIUI, amazon has donated some ec2 cycles to drupal; however I don't know the exact arrangement 20:59:17 wow, this conversation is going to segue very well into the cloud sig meeting momentarily ;) 20:59:23 Jeff_S: I'm sure there's a bloodrite of some kind :) 20:59:26 Jeff_S: no kidding? interesting they turned us down cold last time I went probing about it :) 20:59:27 mmcgrath: I can put you in touch w/ drupal people if you'd want to discuss the details further (off topic here) 20:59:36 Jeff_S, cycles are cheap for them.. storage is where we would sock them.. 20:59:40 likely 20:59:51 smooge: and again - that's only if we're thinking of doing building, etc 20:59:53 plus drupal is cool 20:59:55 but think about proxy or app servers 21:00:00 those are space cheap 21:00:03 and mostly network costly 21:00:19 but on release days, for example, they might help us 21:00:27 we will ahve a mirror there.. 21:00:29 to be able to have 20 proxy servers and N app servers 21:00:32 which leads us to 21:00:34 and then shut them down 21:00:35 okay 21:00:38 * skvidal stops talking 21:00:42 ending this meeting for th e cloud 21:00:47 #endmeeting