20:04:55 #startmeeting Infrastructure 20:04:55 Meeting started Thu Jan 7 20:04:55 2010 UTC. The chair is mmcgrath. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:04:55 Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:04:58 #topic Who's here? 20:05:09 * wzzrd is here 20:05:11 * sijis is here 20:05:26 * a-k with snow, snow, snow 20:05:42 * mmcgrath is happy to see a-k 20:05:48 * mmcgrath is less happy to see snow snow snow. 20:06:04 * jpwdsm lurks 20:06:30 * biertie will look at this meeting, maybe I join infrastructure team later, but first I want to know some more things :) 20:06:38 biertie: well welcome 20:06:41 smooge: FYI :) 20:06:46 ok, lets get started. 20:07:23 #topic Meeting Tickets 20:07:43 and we have no meeting items, that makes that easy. 20:07:44 here 20:07:49 sorry 20:07:55 thought it was wednesday 20:08:01 smooge: no worries :) 20:08:03 #topic PHX2 20:08:10 So there's some outstanding issues in PHX2 at the moment. 20:08:14 @$@#%$$ @$@## @$$# 20:08:15 I'll take them one by one. 20:08:19 1) proxy[1-2] 20:09:06 So.. this one is kind of multi-layered 20:09:12 in PHX1 we had two proxy servers 20:09:23 mostly because at one time we had 4, but moved back to 2 as we moved proxy servers to other locations. 20:09:32 even in PHX1 though we had a single point of failure for proxy. 20:09:40 this is because we can't hairpin (I recently learned what that means) 20:09:51 from our 10. network if we try to ping bastion.fedoraproject.org, we get a 209 address 20:09:59 and there's no way to get from 10. to 209. 20:10:04 so the network connection fails. 20:10:43 everyone aware of that problem? 20:11:21 Ooook. 20:11:29 So here's the options. 20:11:39 build both proxy1 and 2 out (right now only one of them exists) 20:11:44 and figure out how to make them redundant. 20:11:49 probably via a balancer or heartbeat 20:12:08 *or* 20:12:11 wait until hairpinning is fixed 20:12:18 which, as I understand it, will be in February 20:12:28 then it might not matter because we can just use the normal external address for everything. 20:12:40 stupid question: what is hairpinning? 20:12:44 thoughts? 20:12:55 wzzrd: it's a cisco feature, we're behind a nated firewall. 20:13:08 and cisco doesn't let you route back through an originating interface. 20:13:11 it just drops the traffic. 20:13:22 ok clear 20:13:22 stupid question #2: what is a heartbeat? 20:13:23 so, for example, cvs.fedoraproject.org has two addresses one internal and one external. 20:13:38 and if you ping from the internal network to the external one, it just drops. 20:13:57 biertie: heartbeat is part of the linux-ha suite - http://www.linux-ha.org/ 20:14:08 thx 20:14:20 any reason the rh cluster suite is not used for this? 20:14:26 biertie: basically you bring up server1 and server2, they monitor eachother and decide which one should have the listening IP, and when one goes down, the other one brings it up. 20:14:30 wzzrd: way way overkill. 20:14:34 ok 20:14:38 for just passing a single IP back and forth, which is what we want. 20:14:42 we have examined it though. 20:14:51 ok, anything else on that? 20:15:15 Ok, that transitions perfectly into our next outstanding problem 20:15:24 2) bastion and koji are not redundant at this time 20:15:31 dgilmore: koji is still not redundant right? 20:15:52 * mmcgrath assumes not. 20:15:57 in PHX1 we had them both setup with heartbeat. 20:15:59 and it mostly worked. 20:16:14 we had some troubles setting it up in the given outage window we had last time. 20:16:24 so we need to get a better plan this time around and re-do it. 20:16:43 this was caused by the not having enough time during the transition from phx1 to phx2 window. 20:16:52 which was only 36 hours, we just didn't have enough time to do the pre-migration steps. 20:17:08 So now production traffic is flowing over all this stuff and it becomes more difficult to setup without outage windows, you get the idea. 20:17:14 Any questions there? 20:17:30 k, next 20:17:31 3) ntp 20:17:34 smooge is working on that now actually 20:17:39 smooge: care to talk about the latest? 20:17:47 * mmcgrath knows we just talked about it but not everyone knows :) 20:18:07 ok we are limited in PHX2 to what UDP traffic is allowed 20:18:47 smooge: did we get approval for outbound to specific hosts? 20:18:47 so I ahd to find some big name NTP servers to allow them through the firewall. This will then be pushed out to the clients so they can keep their time in line 20:18:53 yes. and it works 20:19:00 smooge: one thing did dawn on me. 20:19:15 we're going to have to have at least some routed traffic on the storage network because otherwise we wouldn't have dns there. 20:19:17 and clock1.redhat.com is not a hairpin it turns out 20:19:26 ah, good. 20:19:56 if we have the 127 network use ns03/ns04 to get DNS then no it wont be needed. 20:20:13 I think it is there but only routes to RHIT. When that goes away it would no longer work 20:20:24 of course my logic comes up with 1+1=3 so not sure 20:20:25 I'm also thinking about things like yum updates, email, etc. 20:20:44 oh yeah 20:21:12 that compliates architectural decisions like this :-/ 20:21:15 unless it simplifies them :) 20:21:59 we can talk about that after the meeting unless you don't think it's a problem :) 20:22:16 ok, anything else on ntp then? If not we can move on 20:22:23 nothing else 20:22:57 k 20:23:10 I'm sure there's other things but nothing majorly pressing at the moment. 20:23:13 sooo 20:23:14 #topic Search Engines 20:23:18 a-k: want to take it? 20:23:27 Sure 20:23:32 what's the latest? 20:23:36 I still haven't preliminarily evaluated all of the candidates like I wanted by now 20:23:43 There are still one or two that would be worth putting in publictest, I think 20:23:47 that's ok, there was a big break in the middle there :) 20:23:58 We can chat outside the meeting what I need to do 20:24:00 * dgilmore is here 20:24:06 I haven't done the pt thing before 20:24:08 sounds good. 20:24:17 I want to keep going through the candidates, too, for preliminary evaluation on my workstation 20:24:22 a-k: sure, it's easy we just need to get you in a group, and that's pretty much it. 20:24:48 sounds good to me, whatever you need just ask, we'll get you setup. 20:24:55 OK. By putting stuff in pt, I can get feedback on what folks like, too 20:25:10 no doubt. 20:25:14 I think that's about it for now, unless there are questions 20:25:15 anyone have anything else on that? 20:25:57 allrighty 20:26:02 #topic Open Flooor 20:26:07 aanyone have anything they'd like to discuss? 20:26:12 any questions, comments, anything goes really 20:26:16 any thoughts on review board? 20:26:20 kanarip: you around? 20:26:24 I could use some help testing OpenID if anyone's interested 20:26:38 how can I help you with OpenID? 20:26:41 jpwdsm: with fas or something else? 20:26:45 mmcgrath: yep 20:26:49 new guy, but willing to lend a hand 20:27:00 jpwdsm: sure, get with wzzrd 20:27:04 I have a Pylons app I've used on a personal server to log into LiveJournal, StackOverflow, etc successfully 20:27:08 I'm sure you two can work something out :) 20:27:21 but I need to work on authenticating against the FAS db 20:27:27 ahh 20:27:30 don't know how to go about that 20:27:31 so you need back end support testing? 20:27:42 yea 20:27:46 abadger1999: you around? 20:27:53 I have one thing to ask? 20:27:59 smooge: sure 20:28:00 mmcgrath: yep 20:28:20 abadger1999: jpwdsm has questions about authenticating with openid and fas, mind helping with that after the meeting? 20:28:33 wzzrd was also interested but he's new and I don't think he has much fas experience yet :) 20:28:39 dgilmore the new disk space for koji that you are looking at? Any help needed? 20:28:43 correct :P 20:28:45 wzzrd: if you're interested you could setup FAS. 20:28:49 wzzrd: https://fedorahosted.org/poseidon/ 20:28:56 git://git.fedorahosted.org/git/fas.git/ 20:28:58 jpwdsm: Sure... but -- I only know the fas end of things. I don't know anything about openid support. 20:29:09 abadger1999: yeah I think that's the part he needs help with :) 20:29:11 jpwdsm: Sound good? 20:29:14 abadger1999: I just need some help setting up a reflected table to authenticate users against FAS 20:29:17 abadger1999: yep 20:29:20 Cool 20:29:29 mmcgrath: can i ask you some q's about that after the meeting? 20:29:32 sure 20:29:39 smooge: so wrt the new storage. 20:29:39 k 20:29:49 dgilmore has been in contact with Dell and has budget for stuff 20:29:53 we're thinking about an equalogic. 20:30:02 though we can't do the raid10 we were hoping for, just not enough dough. 20:30:17 *but* we're hoping to get one shipped to PHX2 for evaluation 20:30:35 ah yeah.. you need a lot of disks to do RAID10 20:30:56 smooge: well i was hoping to do 32 1tb sats drives in raid 10 20:31:01 sata 20:31:22 the good news is we get to actually test the solution before we use it. 20:31:26 I suspect that will be most helpful 20:31:31 getting the evaluation unit in will let us make sure we can get better performance 20:32:02 testing good. 20:32:23 I was going to see if some of my contacts had what we were looking at so I could test here if we couldn't evaluate there 20:32:29 dgilmore: wouldn't raid 50 be better then? 20:32:38 biertie: no 20:32:50 biertie: and I don't think we have the disks for it. 20:32:57 raid-61 FTW 20:33:02 the two problems are more of the same. 20:33:05 1) lots of storage 20:33:08 2) it needs to be fast. 20:33:25 dgilmore: have you heard back from our Dell rep today? 20:33:28 the more spindles we can balance load over the better the performance will be 20:33:33 mmcgrath: not today 20:33:37 I forgot when h e said he'd be back on the job 20:33:43 I remember he was on vacation or something 20:33:53 mmcgrath: he was in training 20:33:55 ahhh 20:33:58 was back in the office today 20:34:02 like vacation, but not as fun 20:34:08 unless it was skydiving training or something 20:34:10 right 20:34:18 ok, so when we do get this all figured out 20:34:23 we're going to throw it in the build rack. 20:34:27 dgilmore: do you know the power requirements? 20:34:40 dgilmore, I was reading that the 1TB disks don't have as many spindles as the 500GB ones because of data format changes.. which makes them slower to seek 20:34:46 mmcgrath: i dont ill see if its listed in the specs 20:34:51 k 20:34:57 but that was a 2008 article.. so not sure ifits still true 20:35:12 smooge: yeah, that's the problem though, we can't afford the 500G disks :( 20:35:16 oh *BUT* 20:35:16 smooge: perpendicular reads improved things 20:35:20 we can chain these things together. 20:35:25 so, in theory, we can add more over time. 20:35:44 mmcgrath: and with its load balancing more units == win 20:35:54 yeah 20:35:58 it's morning again, on /mnt/koji :) 20:36:05 ok, so anyone have any other questions on that? 20:36:07 dgilmore, does that mean mounting them on their side was better? 20:36:20 smooge: mounting them? 20:36:28 perpendicular reads 20:36:51 sorry I was thinking you were saying how they were put in the system made them faster or something. 20:37:01 smooge: i dont know alot about it 20:37:10 just that it was supposed to help things 20:37:24 np. let me know if I can help 20:37:27 smooge: the good (well probably good) think about the equalogics is they're self-contained. We won't need to rely on something like xen2 for them. 20:37:43 pros and cons, but it does eliminate a SPOF 20:37:48 even though it is a SPOF in itself 20:37:57 oh those are the ones that do iscsi straight from the box? 20:38:03 one maybe bad thing is that its iscsi 20:38:17 dgilmore: it can do raw NFS though as well right? 20:38:29 mmcgrath: um still not sure on that 20:38:31 mmcgrath, the apple RAIDX boxes had dual controllers in them to deal with that.. I think the Dell ones had similar things (but cost more) 20:38:43 even if not we can still make that redundant. 20:38:50 smooge: it has dual controllers 20:39:00 our MD1000 probably has the option to do it 20:39:03 but performance ick :) 20:39:12 mmcgrath, I don't think they do raw NFS... the ones I dealt with did ISCSI and SAN if hooked up 20:39:54 they were getting them in at Sandia just as I was leaving.. and got some new ones recently 20:40:00 20:40:10 Ok, anyone have anything else on this or anything else? 20:40:50 if not we'll close the meeting in 30 20:41:26 allrighty 20:41:28 #meetingend 20:41:30 #endmeeting