21:03:44 <smooge> #startmeeting infrastructure2 21:03:44 <zodbot> Meeting started Thu Jan 6 21:03:44 2011 UTC. The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:03:44 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic. 21:03:52 <smooge> #meetingname infrastructure 21:03:52 <zodbot> The meeting name has been set to 'infrastructure' 21:04:01 <smooge> #chair skvidal ricky 21:04:01 <zodbot> Current chairs: ricky skvidal smooge 21:04:15 <smooge> I think we are done with fas01 21:04:34 <skvidal> next ticket? 21:04:34 <smooge> .ticket 2543 21:04:35 <zodbot> smooge: #2543 (upgrade internetx01 to rhel6) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2543 21:05:00 <smooge> Ok this will basically be a reinstall with possibly remote hands. 21:05:16 <skvidal> where can I move proxy02? 21:05:19 <smooge> mmcgrath did this one himself 21:05:22 <ricky> I can look at that today if you're interested 21:05:26 <skvidal> ricky: +1 21:05:30 <ricky> I think we have console there 21:05:33 <smooge> I would just turn it off and take out of dns 21:05:34 <ricky> And proxy02 can just be down for a while, just take out of DNS 21:05:41 <smooge> we would just reinstall afterwords 21:05:50 <skvidal> ricky: nod 21:05:54 <skvidal> okay 21:05:55 <smooge> it is mainly for IPv6 21:06:07 <smooge> the main issue is that there are 2 different routes 21:06:19 <smooge> the main hardware is on one and the guests have a different one 21:06:29 <smooge> but that seems similar to the boxes at other colos 21:07:16 <skvidal> sounds reasonable, though 21:07:20 * nirik ventures to this cold and desolate corner of freenode. 21:07:24 <skvidal> ricky: if you want to nuke proxy02 and do it 0 it's cool 21:08:00 <ricky> OK, will start noting down the configs for those machine after meeting 21:08:04 <smooge> just make sure its documented.. I remember mmcgrath practically swearing on this or bodhost at one point but I can't remember why 21:08:40 <smooge> thanks ricky . let me know how I can help 21:08:49 <smooge> .ticket 2543 21:08:50 <zodbot> smooge: #2543 (upgrade internetx01 to rhel6) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2543 21:08:58 <ricky> Will probably bother you with a bunch of ipv6 questions :-) 21:09:09 <smooge> .ticket 2531 21:09:10 <zodbot> smooge: #2531 (DB03 update) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2531 21:09:47 <smooge> ok this one is that we have db03 on a local compiled version of postgres83 21:09:56 <smooge> and EL5 is with 84 now. 21:09:56 * skvidal twitches 21:10:17 <smooge> so every 30 minutes puppet says "Hey I tried to update these rpms for you but you ated them." 21:10:41 <ricky> Is there an el6 upgrade planned for db03 as well? Should we bother doing one dump/load for 84 and another one for the el6 upgrade? 21:10:42 <smooge> so we need to figure out how to dump and reload like we did with db0[12?] 21:10:49 <dgilmore> we need to move it to el6 21:11:11 <skvidal> are we confident that the db in el6 is stable/performant as it is currently in el5? 21:11:16 <smooge> dgilmore, ok cool. I didn't want to add more makework to it so was going for lowest change 21:11:17 <dgilmore> smooge: we need to take a koji outage 21:11:20 <dgilmore> dump the db 21:11:26 <dgilmore> build a el6 box 21:11:32 <dgilmore> load the backup 21:11:37 <dgilmore> and away we go 21:11:52 <dgilmore> smooge: its the road of greatest pain 21:12:01 <smooge> well we are going to have all kinds of outages coming up :). 21:12:18 * dgilmore notes that db03 is not a virtual machine 21:12:21 <smooge> I have the feeling I won't be doing much at Fudcon but will be at the colo shooting things 21:12:37 <smooge> dgilmore, yeah.. I was wondering if you wanted to try it as a virtual machine again? 21:13:20 <smooge> The hardware for db03 is to be renewed this coming year 21:14:18 <dgilmore> smooge: we can. the reason that it got its own box is gone 21:14:54 <smooge> well if you have time to help me rebuild bvirthost01 to your needs we could put it there.. it has vast tracts of disk 21:15:21 <smooge> or you could go with what is there now if it meets them 21:16:10 <smooge> how about this for a plan of action: 21:16:49 <smooge> 1) build a db03-06 on bvirthost01 with EL6. Do a dump on db03 and do an import in db03-06 to see if it shits bricks or not. 21:17:17 <smooge> 2) rebuild db03-06 (if needed) and do the koji outage with a dump. 21:17:37 <smooge> 3) rename db03-06 to db03 and put into production... see what poops bricks then. 21:17:47 <smooge> 4) go back or continue on. 21:18:02 <smooge> skvidal, ricky? overly complicated or missing something? 21:18:13 <skvidal> doesn't seem overly complicated to me 21:18:31 <skvidal> seems like it would let us test out the basics 21:18:35 <skvidal> and shorten the outage time 21:18:36 <ricky> Sounds good 21:18:49 <skvidal> the hw that db03 is on 21:18:54 <skvidal> is it out of warranty, too? 21:19:22 <smooge> skvidal, it will be in June 21:19:30 <skvidal> okay 21:19:33 <smooge> it will be replaced in our first order list 21:20:19 <skvidal> okay 21:20:33 <skvidal> dgilmore: does the above sound okay to you? 21:21:34 <dgilmore> skvidal: seems fine 21:21:40 <skvidal> okay 21:21:48 <skvidal> I'll update the ticket 21:21:50 <skvidal> next! 21:22:03 <smooge> .ticket 2501 21:22:04 <zodbot> smooge: #2501 (What will it take to upgrade fedorahosted to RHEL6, new trac, new git?) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2501 21:22:24 <dgilmore> do we have a testing rhel6 instance? 21:22:28 <dgilmore> i thought we did 21:22:42 <skvidal> for trac, I thought jkeating did 21:22:42 <smooge> several. Oxf13 put new git and such on one 21:22:44 <smooge> and tested it 21:22:51 <ricky> pt3, I think 21:22:56 <smooge> or pt7 21:23:18 * dgilmore has been trying to find time to look at glusterfs for use on the backend storage in a architecture redesign 21:23:55 <smooge> hmm I wonder what it would take to build/test that at fudcon so we can "see" and break things in the same room 21:24:10 * nirik notes that sheepdog looks interesting. http://www.osrg.net/sheepdog/ (but would require machines be on the same net to share backend storage) 21:24:32 <smooge> I think currently hosted01/02 are on the same network 21:24:58 <dgilmore> they are 21:25:53 <skvidal> smooge: right now - my fudcon schedule is completely chock-a-block 21:26:05 <dgilmore> i believe all our serverbeach stuff is in one datacentre 21:26:08 <dgilmore> or at least it was 21:26:09 <smooge> mine is looking to be in the colo :/ 21:26:14 <ricky> There are two serverbeach datacenters 21:26:20 <skvidal> how about this 21:26:20 <ricky> hosted* are in texas, I think 21:26:23 <ricky> The rest in virginia 21:26:30 <skvidal> do we have a deadline for the hosted update? 21:26:35 <skvidal> can we put this one off just a bit? 21:26:49 <dgilmore> ricky: hrrm ok. i thought they were all in the same one. even though sb has multiple 21:26:56 <skvidal> if it will involve so many infrastructural changes - do we want to wait until the new boss shows up? 21:27:00 <dgilmore> skvidal: its a nice to have thing 21:27:00 <smooge> I think there was some breakage that this was to "fix" by introducing new breakage 21:27:09 <smooge> but I think we could wait til February 21:27:12 <skvidal> dgilmore: right - but not critical 21:27:34 <dgilmore> skvidal: we can do the rhel6 migration without architectural changes also 21:28:00 <ricky> Would we have to do extra work to avoid getting the new trac pacakges? 21:28:17 <dgilmore> ricky: new trac is only in el6 21:28:22 <dgilmore> stay on el5 21:28:25 <smooge> if it becomes critical: 1) rebuild hosted03 as EL6 and do the same items as db03 (shutdown, dump, load, lather rince, repeat) 21:28:26 <dgilmore> old trac 21:28:32 <ricky> OK. 21:28:41 <smooge> 2) rebuild hosted02 as EL6 and put in synce with renamed 03 21:28:45 <skvidal> smooge: I think that is always going to be the preferred path 21:29:02 <smooge> then deal with gluster and multiple sites 21:29:04 <skvidal> the only boxes we should take down to update to rhel6 are those with a multiple boxes supporting the service 21:29:23 <smooge> now here si the thing 21:29:40 <smooge> hosted02 is as far as I can tell a oh shit backup versus any sort of failover 21:29:47 <skvidal> yes 21:29:56 <skvidal> it is, at best, a warm copy 21:30:46 <smooge> ok so soemthing for february then 21:31:20 <ricky> So next? :-) 21:32:03 <smooge> .ticket 2517 21:32:04 <zodbot> smooge: #2517 (Need mod_evasive for EL6) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2517 21:33:07 <smooge> ok have we seen any issues with git that requires us to have mod_evasive on it? 21:33:09 <ricky> So do has pkg01 been falling over without it? :-) 21:33:18 <smooge> I don't think it has 21:33:24 <ricky> Won't deny that gitweb is pretty heavy :-( 21:33:35 <smooge> the only fall overs I have seen have been weirder stuff 21:33:37 <ricky> But -caching seems to be doing the job 21:35:17 <skvidal> when mod_evasive was installed 21:35:25 <skvidal> were we dealing with a problem? 21:35:33 <ricky> It was viewvc 21:35:33 <skvidal> or was it entirely "this might be bad"? 21:35:39 <skvidal> on cvs01? 21:35:43 <skvidal> but nothing else? 21:35:47 <ricky> (it's always the evil VCS web frontend, isn't it?) 21:36:16 <ricky> Pretty sure it was just viewvc shelling out to CVS/RCS 21:36:28 <ricky> And getting hit by robots 21:36:38 <nirik> we do have the snapshot stuff turned off in gitweb still. 21:36:38 <skvidal> okay 21:36:43 <nirik> there's a request to open that up again. 21:37:04 <skvidal> so.. - maybe do this 21:37:14 <skvidal> move the mod_evasive issue over to the snapshot enablement ticket 21:37:24 <nirik> .ticket 2123 21:37:25 <zodbot> nirik: #2123 (Please enable snapshot link in gitweb) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2123 21:38:29 <skvidal> but let's move on 21:38:46 <skvidal> but I think tying mod_evasive to snapshots - only if needed seems like a good plan 21:38:55 <smooge> .ticket 2539 21:38:56 <zodbot> smooge: #2539 (decom xb-01 and reallocate bxen01) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2539 21:39:13 <smooge> ok I will work on getting mod_evasive into epel 21:39:14 <skvidal> so bxen01 reallocating is not gonna happen if it is out of waranty :( 21:39:37 <smooge> well for a bubble sort box to be used just to move crap onto and off of? 21:39:46 <skvidal> <shjrug> fine 21:40:50 <skvidal> next? 21:40:55 <smooge> but we could do it with xen07 also. it is the next one to go out of warranty. 21:41:12 <smooge> .ticket 2544 21:41:13 <zodbot> smooge: #2544 (migrate autoqa01 elsewhere) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2544 21:41:24 <skvidal> autoqa01 is living on cnode01 21:41:29 <skvidal> cnode01 belongs to the cloud group now 21:41:45 <smooge> ok this one is a redesign of networks and such that I was putting together for RHIT before break 21:42:20 <smooge> basically I would like to build a 4th network in PHX2 where the secondary architectures (s390/ppc/arm) can go live and also QA boxes 21:43:05 <smooge> this network would have limited access to the product/devel networks to cut down "oh shit" moments. 21:43:39 <smooge> I will ping mgalgoci/ebrown after the meeting to figure out where this is and if its not we go to plan b 21:43:45 <smooge> which we need to figure out. 21:44:03 <skvidal> this feels like a ways off, then 21:44:05 <skvidal> next? 21:44:13 <smooge> as it turns out that its more than just autoqa01 that moves from cnode. there are 4-8 qa boxes that need to move too 21:44:33 <smooge> .ticket 2545 21:44:34 <zodbot> smooge: #2545 (SOP and best practices for publictest## boxes) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2545 21:44:45 <skvidal> abadger1999 and I were talking about this 21:44:57 <skvidal> specifically we have a number of publictest boxes which seem to be idle but still running 21:45:07 <skvidal> should we make the rule that if they're not in use we shut them down 21:45:36 <smooge> and we need to make a way for people to request them to be turned back on 21:45:58 <smooge> otherwise we end up with "oh I just saw pt03 running so I installed there.. sorry it fucked up your project" 21:46:18 <skvidal> yah 21:46:25 <smooge> but I have no problem with dropping boxes that aren't in use. 21:46:30 <skvidal> abadger1999: ? 21:46:32 <skvidal> you around? 21:46:44 * ricky feels like we can just solve these conflicts when they come up 21:47:08 <abadger1999> yeah. That would seem like better practice than what we do now. 21:47:09 <ricky> Can't remember one happening once yet. 21:47:21 <smooge> ricky, it happened twice last summer 21:47:39 <abadger1999> (Removing boxes that aren't in use; creating them as they're used again) 21:47:41 <ricky> Oh, ignore me then 21:48:15 <smooge> but it was something where "put in an RFR and get someone in sysadmin-main to spin you up a fresh box" would have covered it. 21:48:19 <ricky> Easiest way is: any unlabelled machines get xm shutdown 21:48:37 <skvidal> ricky: virsh destroy - the new xm shutdown :) 21:48:43 <ricky> And then they get erased as soon as soon as we need to build a new one and overwrite it 21:49:12 <skvidal> so if you have notes/thoughts 21:49:15 <skvidal> add them to the ticket 21:49:25 <skvidal> we'll compile that into an SOP and maybe into a script 21:49:30 <smooge> ok will do so. 21:50:12 <skvidal> next? 21:51:22 <smooge> .ticket 2546 21:51:23 <zodbot> smooge: #2546 (bnfs01) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2546 21:52:04 <smooge> I think a side point on this is "Should hyperthreading be turned on our systems?" 21:52:40 * skvidal has no opinion - I always turned it off before 21:52:56 <skvidal> but even turning it off this is an 8-core box with 16GB of ram 21:53:00 <skvidal> that's been 99% idle 21:53:08 * nirik always leaves it on, but not sure why. 21:53:15 <smooge> Many of them have them off, but a couple have them on (mostly new ones). 21:53:34 <skvidal> nirik: right - I'm w/you on 'not sure why' 21:54:06 <smooge> Well depending on the system hyperthreading can be faster (databases like oracle) or slower (VM's) 21:54:27 <smooge> so I usually turn it off because I don't do oracle 21:54:33 <smooge> or similar tools. 21:54:38 <skvidal> okay 21:54:40 <smooge> but back to the main question. 21:54:51 <smooge> this is the belt and suspender box for nfs0`1 21:55:11 <smooge> if nfs01 goes kablooey this is meant to be its replacement. 21:55:17 <skvidal> do we have a doc on what the 'cold failoiver' procedure looks like? 21:55:49 <ricky> Just a guess - probably something like: Check mount, change IP 21:56:06 <ricky> Oh, it's cold. Never mind. 21:56:33 <smooge> not that I know of. dgilmore I think purchased/set it up 21:57:22 <skvidal> the history I got from dgilmore and mmcgrath was this 21:57:32 <skvidal> 1. it was intended to be snapshotted regularly 21:57:36 <skvidal> 2. it was setup that way 21:57:39 <dgilmore> smooge: mmcgrath did the puchase setup 21:57:51 <skvidal> 3. bad things happened in the db when that happened - where you had to manually intervene 21:58:04 <smooge> ouch 21:58:07 <smooge> ok 21:58:08 <skvidal> 4. so it was shelved until someone got back to it? - that last bit is bit fuzzy 21:58:27 <skvidal> dgilmore: does the above sound right? or am I misremembering? 21:58:41 <dgilmore> skvidal: thats pretty spot on 21:59:04 <dgilmore> we went that route because backing up /mnt/koji to tape took days 21:59:14 <skvidal> gotcha 21:59:18 <dgilmore> and blocked all other jobs 21:59:39 <dgilmore> this way we would have something that we could backup to 21:59:49 <dgilmore> but also do a cold failover if it came to it 22:00:23 <dgilmore> but like with hosted ive been thinking of ways to redo it 22:00:50 <dgilmore> and i think that we could use gluster to keep the data realtime replicated to it 22:01:17 <smooge> so that could be our gluster test case? 22:01:21 <dgilmore> we could honestly make bnfs01 be a vm on the host 22:01:35 <dgilmore> smooge: well i want to test it at home first 22:02:54 <smooge> so rebuild the box to be a "virthost" and then create a vm on it. ok 22:03:05 <dgilmore> right 22:03:26 <dgilmore> make the disk available to it via some method 22:04:29 <skvidal> sounds like a plan - dgilmore, smooge: would one of y'all be willing to update the ticket with this? 22:04:54 <smooge> doing so 22:05:12 <smooge> .ticket 2540 22:05:13 <zodbot> smooge: #2540 (find all no longer running xen/kvm instances with disk space still allocated) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2540 22:05:48 <skvidal> smooge: sounds like you did this one - wanna close it? 22:06:13 <smooge> I am still working on this one 22:06:15 <skvidal> okay 22:06:17 <skvidal> great 22:06:36 <smooge> we have http://fpaste.org/oUkw/ 22:06:41 <smooge> a lot of dirty partitions 22:07:05 <smooge> not counting what is unused on the iscsi box 22:07:21 <smooge> which I haven't finished with yet. 22:07:48 <skvidal> ok 22:07:59 <smooge> how I determined was did a pvs and looked for parttiitons that were: -wi-a- 22:08:13 <smooge> lvs sorry 22:08:32 <smooge> I think all but the bxen boxes are pretty safe to remove 22:09:12 * skvidal didn't know about looking for 'o' in lvs 22:09:13 <skvidal> good move 22:09:14 <smooge> bxen02 had a lot of stuff on it that seemed special 22:09:47 <smooge> yeah I figured it out when playing with the kpartx to look at the age of old images 22:10:40 <smooge> dgilmore, the partitions on bxen02 like mpmtest koji2 22:11:49 <smooge> anyway I think I can leave bxenXX til later and clean up the rest 22:12:13 <smooge> last ticket 22:12:17 <smooge> .ticket 2530 22:12:18 <zodbot> smooge: #2530 (Selinux issues on PPC servers) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2530 22:12:20 <dgilmore> smooge: mpmtest is mmcgrath 22:12:31 <smooge> ok will ask him 22:13:00 <smooge> the other ones we can go over when you have more bandwidth and rest from this everlong meeting :) 22:13:05 <smooge> dgilmore, thanks 22:13:58 <smooge> the last issue looks to have been fixed over break I will close that one. 22:14:33 <smooge> skvidal, ricky we are done. 22:14:38 <skvidal> kewl 22:14:40 <skvidal> thank you 22:14:53 <smooge> now to move another 30 of our open tickets to meeting :) 22:14:54 <dgilmore> smooge: the koji2 on xenGuests i think was put there to migrate it from one host to another 22:14:56 <abadger1999> Yay 22:15:07 <ricky> #endmeeting