20:00:28 <mmcgrath> #startmeeting Infrastructure 20:00:29 <zodbot> Meeting started Thu Apr 22 20:00:28 2010 UTC. The chair is mmcgrath. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:31 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:00:35 <mmcgrath> zodbot: do as I mean not as I say 20:01:01 <gholms|work> Hehe 20:01:15 * nirik is hanging around in the cheap seats. 20:01:19 <mmcgrath> #topic who's here? 20:01:22 * ricky 20:01:24 * a-k is 20:01:39 <smooge> here 20:01:41 * Infern4us 20:01:48 <smooge> needs coffee 20:02:11 * mdomsch 20:02:14 <mmcgrath> Ok, lets get started 20:02:17 <mmcgrath> #topic Final Release 20:02:25 <mmcgrath> The final F13 release is on the way here pretty quick. 20:02:34 <mmcgrath> Our final freeze goes into place on the 4th IIRC. 20:03:23 <mmcgrath> Anyone have any questions or concerns about that? 20:03:30 <mmcgrath> any major projects to get deployed before then? 20:03:33 <mmcgrath> I have only 2 major change 20:03:34 <mmcgrath> s 20:04:24 <mmcgrath> Alrighty, well we can move on. 20:04:38 <ricky> What are the changes? 20:04:39 <smooge> hmm isn't that the same day as U-10.04? 20:04:52 <mmcgrath> ricky: going into those right now 20:04:56 <mmcgrath> #topic Insight 20:04:58 <mmcgrath> stickster: ping 20:05:24 <mmcgrath> I'm wondering if there's anything we can get in place today now so there's less to do later. 20:05:24 <smooge> mmcgrath, I have no projects for that time. I was going to deploy rsyslog next week 20:05:31 <stickster> mmcgrath: pong 20:05:48 <mmcgrath> stickster: hey, so is there any insight bits that can be done now? 20:05:49 <mdomsch> not much time before the freeze then 20:05:59 <mmcgrath> anything that, even though the whole project isn't ready, parts could be deployed now? 20:06:49 <mmcgrath> stickster: I'm thinking even if the base stuff is in place and not advertised it'd help increase the chances of success. 20:06:50 <stickster> mmcgrath: There are still both styling and technical bits that have critical or blocker bugs attached 20:07:15 <mmcgrath> what are the nature of the changes that are still to be made? packaging? upstream stuff? 20:07:15 * hydh is here too 20:07:42 <stickster> mmcgrath: There are problems with the authentication that still need to be solved, then upstreamed to the fedora-zikula module and released 20:07:57 <stickster> The styling bugs are not as pernicious but will take some time to resolve 20:07:59 <mmcgrath> so in your estimation, we still on track for deployment later in the month? 20:08:15 <stickster> mmcgrath: http://lists.fedoraproject.org/pipermail/logistics/2010-April/000510.html 20:08:17 <mmcgrath> stickster: also, how much of this code is stuff we'll have to maintain? 20:08:29 <stickster> No, we agreed to push off to post-GA 20:08:35 <mmcgrath> ah, k. 20:08:37 <stickster> There's not much code we have to maintain 20:08:39 <mmcgrath> I missed that, sorry. 20:08:41 <stickster> AuthFAS module is about it. 20:08:46 <mmcgrath> excellent. 20:08:48 <stickster> And that's fairly understandable 20:08:58 <mmcgrath> stickster: ok, thanks for the latest. Anything else? 20:09:10 <stickster> It's the other issues we still have to solve that weren't ready for our go/no-go that caused us to wave off. 20:09:28 <stickster> logistics@ list is where discussion is taking place about what we're going to do next. 20:09:32 <stickster> eof 20:09:44 <mmcgrath> stickster: thanks 20:09:48 <mmcgrath> ok, next topic 20:09:52 <mmcgrath> #topic netapp migration 20:10:02 <mmcgrath> This is something I wanted to have done before the beta but failed to do so 20:10:10 <smooge> ok what is this? 20:10:26 <mmcgrath> basically I need to move alt and whatever is left on the secondary1 drives, to the netapp. 20:10:53 <mmcgrath> smooge: so they'll show up on download.fedora.redhat.com 20:11:41 <mmcgrath> any questions or concerns about that? 20:11:53 <mmcgrath> For me the big one is trying to figure out exactly how to let everyone continue to upload their content. 20:11:53 <mdomsch> nah, they're small 20:11:55 <smooge> not really 20:12:04 <mmcgrath> AFAIK it'll all be the same way. 20:12:12 <mmcgrath> Ok, moving on :) 20:12:14 <smooge> oh.. there is that 20:12:20 <mdomsch> log into a server that has it mounted r/w 20:12:27 <mdomsch> right now that's secondary1 for alt 20:12:36 <smooge> who is allowed to do this? 20:12:41 <mmcgrath> mdomsch: well I'm thinking they'd still be allowed to do that 20:12:48 <mmcgrath> but then I'm not sure what to do with secondary1's actual drives :) 20:12:56 <mdomsch> altvideo group can for /pub/alt/video/ 20:12:58 <mmcgrath> maybe just have them sync from the netapp and continue to expose. 20:13:02 <mmcgrath> smooge: there's an SOP 20:13:04 * mmcgrath gets it 20:13:05 <mdomsch> yeah 20:13:16 <mmcgrath> smooge: http://fedoraproject.org/wiki/Content_Hosting_Infrastructure_SOP 20:13:25 <mmcgrath> giving users direct access to the netapp concerns me a bit 20:13:37 <mmcgrath> but really it's a completely different share then the /pub/fedora and /pub/epel stuff 20:13:49 <mmcgrath> and the only thing they could do is fill the disk up which A) we monitor and B) is easy to fix 20:13:58 <tremble> Which netapp modules are yu using? 20:14:13 <smooge> ok so it will need to be a seperate partition/log-volume on the netapp 20:14:18 <mmcgrath> tremble: I always forget. 20:14:26 <mmcgrath> smooge: <nod> it already is. 20:14:40 <mmcgrath> smooge: oh wait, not a seperate 'partition' in that way 20:14:53 <mmcgrath> since we don't really know what future expansion will be 20:15:05 <mmcgrath> this will allow either side of the house to grow without us having to guess. 20:15:11 <mdomsch> so alt.fp.o becomes a new VM too? 20:15:16 <skvidal> oh crap, the meeting 20:15:20 <tremble> FWIW $POE uses a netapp. 20:15:20 <skvidal> sorry about being late 20:15:22 <smooge> ah well I was wondering about setting up a netapp quota and not having to worry about filling 20:15:37 <mmcgrath> mdomsch: I haven't figured that part out yet, I might just see if download.fedora.redhat.com will start accepting alt.fedoraproject.org 20:15:41 <mmcgrath> smooge: that could work too. 20:16:02 <smooge> it allows for us to also seperate out differing snapshot schedules and such 20:16:43 <mmcgrath> <nod> 20:16:49 <mmcgrath> so anyone have any other questions or comments on that/ 20:16:50 <mmcgrath> ? 20:17:48 <smooge> no we can talk offline 20:17:56 <mmcgrath> k 20:17:59 <mmcgrath> next topic! 20:18:06 <tremble> At $POE we've found that having 1 or 2 aaggregates and multiple thin provisioned volumes works well as long as you monitor the aggregates 20:18:11 <mmcgrath> #topic collectd 20:18:19 <mmcgrath> So I've added some more collectd modules 20:18:25 <mmcgrath> Of particular interest are these 3 20:18:49 <mmcgrath> ping test: 20:18:51 <mmcgrath> .tiny https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=log01&plugin=ping×pan=3600&action=show_selection&ok_button=OK 20:18:53 <zodbot> mmcgrath: http://tinyurl.com/zddwih 20:19:16 <mmcgrath> postgres connections: 20:19:20 <mmcgrath> .tiny https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=db02&plugin=pg_conns×pan=3600&action=show_selection&ok_button=OK 20:19:23 <zodbot> mmcgrath: http://tinyurl.com/zddv5d 20:19:36 <mmcgrath> mdomsch: you might be interested in what happened to mirrormanager there in the last hour 20:19:44 * gholms|work hopes that doesn't cause extra URLs to show up in the minutes 20:19:56 <mmcgrath> .tiny https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=proxy3&plugin=haproxy×pan=3600&action=show_selection&ok_button=OK 20:19:58 <zodbot> mmcgrath: http://tinyurl.com/zddtjd 20:20:02 <mmcgrath> and that's the last one, haproxy by site 20:20:21 <mdomsch> looking 20:20:24 <ricky> Is that response time I see? Veeery nice 20:20:55 <mmcgrath> ricky: which one? the haproxy one? 20:21:04 <mdomsch> ricky, what unites? 20:21:04 <mmcgrath> nope that's actually... 20:21:05 <mdomsch> units? 20:21:12 <mmcgrath> stot: requests/s 20:21:19 <mmcgrath> econ: errors/s 20:21:28 <mmcgrath> eresp: err responses/s 20:21:29 <mdomsch> so, every 10 minutes on the dot, we spike in mirrorlist requests 20:21:31 <ricky> Ah, OK 20:21:38 <mmcgrath> econ: is error connections /s 20:21:46 <mmcgrath> ricky: there's LOTS we can get out of haproxy if you want to add something 20:21:49 <mdomsch> for about a minute then it drops back down 20:21:50 <mmcgrath> response time is on my list. 20:21:58 <mmcgrath> mdomsch: what did you think about MM db connections there? 20:22:29 <smooge> hmm the tiny urls dont seem to work 20:22:43 <mmcgrath> smooge: pooh, interesting 20:22:49 <mmcgrath> use the longer ones then :) 20:23:06 <mdomsch> mmcgrath, blow that out over a larger time scale... 20:23:14 <mmcgrath> yeah it's pretty common 20:23:17 * mdomsch bets that's the crawler with 80 threads 20:23:25 <mmcgrath> mdomsch: that could very well be. 20:23:26 <mdomsch> tailing off at the end of the run 20:23:30 <mmcgrath> yeah 20:23:44 <mdomsch> it tries to keep 80 threads running at once, starting a new one as one completes 20:24:04 <mdomsch> so it'll flatline around 80, then tail off, the jump back to 80 for a while 20:24:04 <smooge> beb back in a sec 20:24:09 <mmcgrath> <nod> 20:24:18 <mmcgrath> but yeah, we now have more visibility into our applications then ever before. 20:24:31 <mmcgrath> we've learned a great deal about our environments just in the last couple of weeks from collectd. 20:24:32 <mdomsch> yep, that's what it's doing. Nice graphs. :-) 20:24:35 <mmcgrath> in particular it's the 10s resolution. 20:24:43 <mmcgrath> it is just so much detail that we were missing before. 20:24:45 <smooge> ok dog thrown outside 20:25:27 <mmcgrath> anyone have any questions / requests? 20:25:44 <smooge> no thanks for this 20:25:59 <smooge> oh one question 20:26:06 <smooge> what does the ping test against? 20:26:28 <mmcgrath> We can have it run from everywhere but right now I've got it running on log1 (which is the central ping server) 20:26:31 <mmcgrath> maybe I should have used noc1. 20:26:32 <mmcgrath> anywho. 20:26:37 <mmcgrath> it then pings out to the hosts from there 20:26:39 <mmcgrath> just an ICMP ping 20:26:47 <mmcgrath> then tracks latency, std dev, and drop rate. 20:27:02 <mmcgrath> How do you add more hosts? 20:27:06 <mmcgrath> I'm glad you asked mmcgrath :) 20:27:10 <mmcgrath> collectd::ping { 'ping': 20:27:10 <mmcgrath> hosts => ['tummy1.fedoraproject.org', 'telia1.fedoraproject.org', 'serverbeach4.fedoraproject.org', 'serverbeach1.fedoraproject.org', 'osuosl1.fedoraproject.org'] 20:27:11 <ricky> Heheh 20:27:13 <mmcgrath> } 20:27:18 <mmcgrath> add that to the node or server group you want 20:27:19 <smooge> ah cool 20:27:22 <mmcgrath> and collectd will do the rest. 20:27:36 <smooge> I think we want noc01/noc02 as the ping testers. 20:27:41 <mdomsch> mmcgrath, how does haproxy determine if mirror-lists is down ? 20:27:42 <smooge> but log01 works too 20:28:24 <mmcgrath> mdomsch: it hits /mirrorlist every 5 seconds, 3 failures in a row takes that node out. 20:28:37 <ricky> It should go by timeouts we have set or http status codes 20:28:38 <mmcgrath> smooge: actually that is a good transition into the next topic I wanted to bring up (also monitoring oriented) 20:28:44 <mdomsch> ah. then I bet that's the hourly cache refresh non-responsiveness doing it 20:28:53 <mmcgrath> anyone have any questions or comments on this? 20:28:56 <mdomsch> that's kind of a short timeout... 20:29:39 <tremble> Suppose it depends wht you consider acceptable down time. 20:29:56 <ricky> Oh, that actually makes sense. I wonder how long one cache refresh takes an app server out for 20:30:12 <mmcgrath> mdomsch: does the refresh staggar at all? 20:30:36 <mmcgrath> mdomsch: actually that doesn't match up 20:30:39 <mmcgrath> it's an hourly refresh. 20:30:45 <mmcgrath> but we see stuff going down more often then that 20:30:48 <mmcgrath> mdomsch: take a look at proxy3 - 20:30:50 <mmcgrath> grep mirror /var/log/messages 20:30:56 <mmcgrath> anywho, we can discuss that more in a bit. 20:30:58 <mdomsch> ok, we don't have to solve it here 20:31:01 <mmcgrath> <nod> 20:31:03 <mmcgrath> #topic Nagios 20:31:04 <mmcgrath> so 20:31:09 <mmcgrath> right now we have noc1 and noc2. 20:31:20 <mmcgrath> if we move to nagios3, it becomes easier to merge the two. 20:31:43 <mmcgrath> but I'm still wary about having monitoring only in PHX2 20:32:12 <smooge> how does it merge them? 20:32:19 * ricky wouldn't mind moving noc2 out of germany though :-/ As nice as it is to get a perspective from there, it often gives alerts on network issues we can't do anything about 20:32:27 <mmcgrath> smooge: well, nagios3 has a better ability to realize multiple IPs for a given host. 20:32:40 <mmcgrath> ricky: agreed. 20:32:46 <smooge> ah but we would still have hairpin problems 20:32:50 <mmcgrath> smooge: so we can do the whole 'internal' and 'external' test without problems. 20:32:55 <mmcgrath> yeah 20:33:05 <mmcgrath> yeah that's it then, there's a blocker there. 20:33:15 <mmcgrath> because external to PHX2 we can't monitor everything in phx2. 20:33:21 <mmcgrath> inside phx2 we can't monitor everything in phx2 :) 20:33:29 <mmcgrath> so we'll probably have to keep that dynamic at least somewhere. 20:33:41 <mmcgrath> lets think on it a bit 20:33:53 <smooge> I would make noc03 in ibiblio and go from there 20:34:19 <mmcgrath> I'm hoping to work with pvangundy on that, he's a volunteer that's been gone for a while. 20:34:22 <mmcgrath> He's back but has been busy 20:34:25 * mmcgrath hopes he gets less busy 20:34:30 <mmcgrath> anywho, anything else on that for the meeting? 20:34:48 <mmcgrath> alrighty 20:34:53 <mmcgrath> #topic search engine 20:34:57 <mmcgrath> a-k: whats the latest? 20:35:07 <a-k> I've got DataparkSearch on publictest3 20:35:15 <mmcgrath> url? 20:35:17 <a-k> #link http://publictest3.fedoraproject.org/cgi-bin/dpsearch 20:35:28 <a-k> DataparkSearch forked from mnoGoSearch in 2003 20:35:35 <a-k> Mostly so far it seems like a broken version of mnoGoSearch 20:35:54 <a-k> I've indexed only a tiny number of documents from the wiki 20:36:03 <mmcgrath> :( that's no fun 20:36:06 <a-k> I'll poke it a little more to see how bad it is 20:36:17 <a-k> More docs, etc 20:36:29 <ricky> Search is hard :-/ 20:36:29 <hydh> hehe 20:36:41 <mmcgrath> a-k: thanks, anything else for now? 20:36:49 <a-k> I don't think so 20:36:54 <mmcgrath> alrighty 20:37:01 <mmcgrath> Well with that I'll open the floor 20:37:04 <mmcgrath> #topic Open Floor 20:37:11 <mmcgrath> anyone have anything they'd like to discuss? 20:37:50 <mmcgrath> if not we'll close in 30 20:39:08 <mmcgrath> sweet, silence is golden 20:39:11 <mmcgrath> ok 20:39:12 <mmcgrath> #endmeeting