20:04:55 <mmcgrath> #startmeeting Infrastructure
20:04:55 <zodbot> Meeting started Thu Jan  7 20:04:55 2010 UTC.  The chair is mmcgrath. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:04:55 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
20:04:58 <mmcgrath> #topic Who's here?
20:05:09 * wzzrd is here
20:05:11 * sijis is here
20:05:26 * a-k with snow, snow, snow
20:05:42 * mmcgrath is happy to see a-k
20:05:48 * mmcgrath is less happy to see snow snow snow.
20:06:04 * jpwdsm lurks
20:06:30 * biertie will look at this meeting, maybe I join infrastructure team later, but first I want to know some more things :)
20:06:38 <mmcgrath> biertie: well welcome
20:06:41 <mmcgrath> smooge: FYI :)
20:06:46 <mmcgrath> ok, lets get started.
20:07:23 <mmcgrath> #topic Meeting Tickets
20:07:43 <mmcgrath> and we have no meeting items, that makes that easy.
20:07:44 <smooge> here
20:07:49 <smooge> sorry
20:07:55 <smooge> thought it was wednesday
20:08:01 <mmcgrath> smooge: no worries :)
20:08:03 <mmcgrath> #topic PHX2
20:08:10 <mmcgrath> So there's some outstanding issues in PHX2 at the moment.
20:08:14 <smooge> @$@#%$$ @$@## @$$#
20:08:15 <mmcgrath> I'll take them one by one.
20:08:19 <mmcgrath> 1) proxy[1-2]
20:09:06 <mmcgrath> So.. this one is kind of multi-layered
20:09:12 <mmcgrath> in PHX1 we had two proxy servers
20:09:23 <mmcgrath> mostly because at one time we had 4, but moved back to 2 as we moved proxy servers to other locations.
20:09:32 <mmcgrath> even in PHX1 though we had a single point of failure for proxy.
20:09:40 <mmcgrath> this is because we can't hairpin (I recently learned what that means)
20:09:51 <mmcgrath> from our 10. network if we try to ping bastion.fedoraproject.org, we get a 209 address
20:09:59 <mmcgrath> and there's no way to get from 10. to 209.
20:10:04 <mmcgrath> so the network connection fails.
20:10:43 <mmcgrath> everyone aware of that problem?
20:11:21 <mmcgrath> Ooook.
20:11:29 <mmcgrath> So here's the options.
20:11:39 <mmcgrath> build both proxy1 and 2 out (right now only one of them exists)
20:11:44 <mmcgrath> and figure out how to make them redundant.
20:11:49 <mmcgrath> probably via a balancer or heartbeat
20:12:08 <mmcgrath> *or*
20:12:11 <mmcgrath> wait until hairpinning is fixed
20:12:18 <mmcgrath> which, as I understand it, will be in February
20:12:28 <mmcgrath> then it might not matter because we can just use the normal external address for everything.
20:12:40 <wzzrd> stupid question: what is hairpinning?
20:12:44 <mmcgrath> thoughts?
20:12:55 <mmcgrath> wzzrd: it's a cisco feature, we're behind a nated firewall.
20:13:08 <mmcgrath> and cisco doesn't let you route back through an originating interface.
20:13:11 <mmcgrath> it just drops the traffic.
20:13:22 <wzzrd> ok clear
20:13:22 <biertie> stupid question #2: what is a heartbeat?
20:13:23 <mmcgrath> so, for example, cvs.fedoraproject.org has two addresses one internal and one external.
20:13:38 <mmcgrath> and if you ping from the internal network to the external one, it just drops.
20:13:57 <mmcgrath> biertie: heartbeat is part of the linux-ha suite - http://www.linux-ha.org/
20:14:08 <biertie> thx
20:14:20 <wzzrd> any reason the rh cluster suite is not used for this?
20:14:26 <mmcgrath> biertie: basically you bring up server1 and server2, they monitor eachother and decide which one should have the listening IP, and when one goes down, the other one brings it up.
20:14:30 <mmcgrath> wzzrd: way way overkill.
20:14:34 <wzzrd> ok
20:14:38 <mmcgrath> for just passing a single IP back and forth, which is what we want.
20:14:42 <mmcgrath> we have examined it though.
20:14:51 <mmcgrath> ok, anything else on that?
20:15:15 <mmcgrath> Ok, that transitions perfectly into our next outstanding problem
20:15:24 <mmcgrath> 2) bastion and koji are not redundant at this time
20:15:31 <mmcgrath> dgilmore: koji is still not redundant right?
20:15:52 * mmcgrath assumes not.
20:15:57 <mmcgrath> in PHX1 we had them both setup with heartbeat.
20:15:59 <mmcgrath> and it mostly worked.
20:16:14 <mmcgrath> we had some troubles setting it up in the given outage window we had last time.
20:16:24 <mmcgrath> so we need to get a better plan this time around and re-do it.
20:16:43 <mmcgrath> this was caused by the not having enough time during the transition from phx1 to phx2 window.
20:16:52 <mmcgrath> which was only 36 hours, we just didn't have enough time to do the pre-migration steps.
20:17:08 <mmcgrath> So now production traffic is flowing over all this stuff and it becomes more difficult to setup without outage windows, you get the idea.
20:17:14 <mmcgrath> Any questions there?
20:17:30 <mmcgrath> k, next
20:17:31 <mmcgrath> 3) ntp
20:17:34 <mmcgrath> smooge is working on that now actually
20:17:39 <mmcgrath> smooge: care to talk about the latest?
20:17:47 * mmcgrath knows we just talked about it but not everyone knows :)
20:18:07 <smooge> ok we are limited in PHX2 to what UDP traffic is allowed
20:18:47 <mmcgrath> smooge: did we get approval for outbound to specific hosts?
20:18:47 <smooge> so I ahd to find some big name NTP servers to allow them through the firewall. This will then be pushed out to the clients so they can keep their time in line
20:18:53 <smooge> yes. and it works
20:19:00 <mmcgrath> smooge: one thing did dawn on me.
20:19:15 <mmcgrath> we're going to have to have at least some routed traffic on the storage network because otherwise we wouldn't have dns there.
20:19:17 <smooge> and clock1.redhat.com is not a hairpin it turns out
20:19:26 <mmcgrath> ah, good.
20:19:56 <smooge> if we have the 127 network use ns03/ns04 to get DNS then no it wont be needed.
20:20:13 <smooge> I think it is there but only routes to RHIT. When that goes away it would no longer work
20:20:24 <smooge> of course my logic comes up with 1+1=3 so not sure
20:20:25 <mmcgrath> I'm also thinking about things like yum updates, email, etc.
20:20:44 <smooge> oh yeah
20:21:12 <mmcgrath> that compliates architectural decisions like this :-/
20:21:15 <mmcgrath> unless it simplifies them :)
20:21:59 <mmcgrath> we can talk about that after the meeting unless you don't think it's a problem :)
20:22:16 <mmcgrath> ok, anything else on ntp then?  If not we can move on
20:22:23 <smooge> nothing else
20:22:57 <mmcgrath> k
20:23:10 <mmcgrath> I'm sure there's other things but nothing majorly pressing at the moment.
20:23:13 <mmcgrath> sooo
20:23:14 <mmcgrath> #topic Search Engines
20:23:18 <mmcgrath> a-k: want to take it?
20:23:27 <a-k> Sure
20:23:32 <mmcgrath> what's the latest?
20:23:36 <a-k> I still haven't preliminarily evaluated all of the candidates like I wanted by now
20:23:43 <a-k> There are still one or two that would be worth putting in publictest, I think
20:23:47 <mmcgrath> that's ok, there was a big break in the middle there :)
20:23:58 <a-k> We can chat outside the meeting what I need to do
20:24:00 * dgilmore is here
20:24:06 <a-k> I haven't done the pt thing before
20:24:08 <mmcgrath> sounds good.
20:24:17 <a-k> I want to keep going through the candidates, too, for preliminary evaluation on my workstation
20:24:22 <mmcgrath> a-k: sure, it's easy we just need to get you in a group, and that's pretty much it.
20:24:48 <mmcgrath> sounds good to me, whatever you need just ask, we'll get you setup.
20:24:55 <a-k> OK.  By putting stuff in pt, I can get feedback on what folks like, too
20:25:10 <mmcgrath> no doubt.
20:25:14 <a-k> I think that's about it for now, unless there are questions
20:25:15 <mmcgrath> anyone have anything else on that?
20:25:57 <mmcgrath> allrighty
20:26:02 <mmcgrath> #topic Open Flooor
20:26:07 <mmcgrath> aanyone have anything they'd like to discuss?
20:26:12 <mmcgrath> any questions, comments, anything goes really
20:26:16 <mmcgrath> any thoughts on review board?
20:26:20 <mmcgrath> kanarip: you around?
20:26:24 <jpwdsm> I could use some help testing OpenID if anyone's interested
20:26:38 <wzzrd> how can I help you with OpenID?
20:26:41 <mmcgrath> jpwdsm: with fas or something else?
20:26:45 <jpwdsm> mmcgrath: yep
20:26:49 <wzzrd> new guy, but willing to lend a hand
20:27:00 <mmcgrath> jpwdsm: sure, get with wzzrd
20:27:04 <jpwdsm> I have a Pylons app I've used on a personal server to log into LiveJournal, StackOverflow, etc successfully
20:27:08 <mmcgrath> I'm sure you two can work something out :)
20:27:21 <jpwdsm> but I need to work on authenticating against the FAS db
20:27:27 <mmcgrath> ahh
20:27:30 <jpwdsm> don't know how to go about that
20:27:31 <mmcgrath> so you need back end support testing?
20:27:42 <jpwdsm> yea
20:27:46 <mmcgrath> abadger1999: you around?
20:27:53 <smooge> I have one thing to ask?
20:27:59 <mmcgrath> smooge: sure
20:28:00 <abadger1999> mmcgrath: yep
20:28:20 <mmcgrath> abadger1999: jpwdsm has questions about authenticating with openid and fas, mind helping with that after the meeting?
20:28:33 <mmcgrath> wzzrd was also interested but he's new and I don't think he has much fas experience yet :)
20:28:39 <smooge> dgilmore the new disk space for koji that you are looking at? Any help needed?
20:28:43 <wzzrd> correct :P
20:28:45 <mmcgrath> wzzrd: if you're interested you could setup FAS.
20:28:49 <jpwdsm> wzzrd: https://fedorahosted.org/poseidon/
20:28:56 <mmcgrath> git://git.fedorahosted.org/git/fas.git/
20:28:58 <abadger1999> jpwdsm: Sure... but -- I only know the fas end of things.  I don't know anything about openid support.
20:29:09 <mmcgrath> abadger1999: yeah I think that's the part he needs help with :)
20:29:11 <abadger1999> jpwdsm: Sound good?
20:29:14 <jpwdsm> abadger1999: I just need some help setting up a reflected table to authenticate users against FAS
20:29:17 <jpwdsm> abadger1999: yep
20:29:20 <abadger1999> Cool
20:29:29 <wzzrd> mmcgrath: can i ask you some q's about that after the meeting?
20:29:32 <mmcgrath> sure
20:29:39 <mmcgrath> smooge: so wrt the new storage.
20:29:39 <wzzrd> k
20:29:49 <mmcgrath> dgilmore has been in contact with Dell and has budget for stuff
20:29:53 <mmcgrath> we're thinking about an equalogic.
20:30:02 <mmcgrath> though we can't do the raid10 we were hoping for, just not enough dough.
20:30:17 <mmcgrath> *but* we're hoping to get one shipped to PHX2 for evaluation
20:30:35 <smooge> ah yeah.. you need a lot of disks to do RAID10
20:30:56 <dgilmore> smooge: well i was hoping to do 32 1tb sats drives in raid 10
20:31:01 <dgilmore> sata
20:31:22 <mmcgrath> the good news is we get to actually test the solution before we use it.
20:31:26 <mmcgrath> I suspect that will be most helpful
20:31:31 <dgilmore> getting the evaluation unit in will let us make sure we can get better performance
20:32:02 <smooge> testing good.
20:32:23 <smooge> I was going to see if some of my contacts had what we were looking at so I could test here if we couldn't evaluate there
20:32:29 <biertie> dgilmore: wouldn't raid 50 be better then?
20:32:38 <dgilmore> biertie: no
20:32:50 <mmcgrath> biertie: and I don't think we have the disks for it.
20:32:57 <smooge> raid-61 FTW
20:33:02 <mmcgrath> the two problems are more of the same.
20:33:05 <mmcgrath> 1) lots of storage
20:33:08 <mmcgrath> 2) it needs to be fast.
20:33:25 <mmcgrath> dgilmore: have you heard back from our Dell rep today?
20:33:28 <dgilmore> the more spindles we can balance load over the better the performance will be
20:33:33 <dgilmore> mmcgrath:  not today
20:33:37 <mmcgrath> I forgot when h e said he'd be back on the job
20:33:43 <mmcgrath> I remember he was on vacation or something
20:33:53 <dgilmore> mmcgrath: he was in training
20:33:55 <mmcgrath> ahhh
20:33:58 <dgilmore> was back in the office today
20:34:02 <mmcgrath> like vacation, but not as fun
20:34:08 <mmcgrath> unless it was skydiving training or something
20:34:10 <dgilmore> right
20:34:18 <mmcgrath> ok, so when we do get this all figured out
20:34:23 <mmcgrath> we're going to throw it in the build rack.
20:34:27 <mmcgrath> dgilmore: do you know the power requirements?
20:34:40 <smooge> dgilmore, I was reading that the 1TB disks don't have as many spindles as the 500GB ones because of data format changes.. which makes them slower to seek
20:34:46 <dgilmore> mmcgrath: i dont ill see if its listed in the specs
20:34:51 <mmcgrath> k
20:34:57 <smooge> but that was a 2008 article.. so not sure ifits still true
20:35:12 <mmcgrath> smooge: yeah, that's the problem though, we can't afford the 500G disks :(
20:35:16 <mmcgrath> oh *BUT*
20:35:16 <dgilmore> smooge: perpendicular reads improved things
20:35:20 <mmcgrath> we can chain these things together.
20:35:25 <mmcgrath> so, in theory, we can add more over time.
20:35:44 <dgilmore> mmcgrath: and with its load balancing more units == win
20:35:54 <mmcgrath> yeah
20:35:58 <mmcgrath> it's morning again, on /mnt/koji :)
20:36:05 <mmcgrath> ok, so anyone have any other questions on that?
20:36:07 <smooge> dgilmore, does that mean mounting them on their side was better?
20:36:20 <dgilmore> smooge: mounting them?
20:36:28 <smooge> perpendicular reads
20:36:51 <smooge> sorry I was thinking you were saying how they were put in the system made them faster or something.
20:37:01 <dgilmore> smooge: i dont know alot about it
20:37:10 <dgilmore> just that it was supposed to help things
20:37:24 <smooge> np. let me know if I can help
20:37:27 <mmcgrath> smooge: the good (well probably good) think about the equalogics is they're self-contained.  We won't need to rely on something like xen2 for them.
20:37:43 <mmcgrath> pros and cons, but it does eliminate a SPOF
20:37:48 <mmcgrath> even though it is a SPOF in itself
20:37:57 <smooge> oh those are the ones that do iscsi straight from the box?
20:38:03 <dgilmore> one maybe bad thing is that its iscsi
20:38:17 <mmcgrath> dgilmore: it can do raw NFS though as well right?
20:38:29 <dgilmore> mmcgrath: um still not sure on that
20:38:31 <smooge> mmcgrath, the apple RAIDX boxes had dual controllers in them to deal with that.. I think the Dell ones had similar things (but cost more)
20:38:43 <mmcgrath> even if not we can still make that redundant.
20:38:50 <dgilmore> smooge: it has dual controllers
20:39:00 <mmcgrath> our MD1000 probably has the option to do it
20:39:03 <mmcgrath> but performance ick :)
20:39:12 <smooge> mmcgrath, I don't think they do raw NFS... the ones I dealt with did ISCSI and SAN if hooked up
20:39:54 <smooge> they were getting them in at Sandia just as I was leaving.. and got some new ones recently
20:40:00 <mmcgrath> <nod>
20:40:10 <mmcgrath> Ok, anyone have anything else on this or anything else?
20:40:50 <mmcgrath> if not we'll close the meeting in 30
20:41:26 <mmcgrath> allrighty
20:41:28 <mmcgrath> #meetingend
20:41:30 <mmcgrath> #endmeeting