20:00:28 #startmeeting Infrastructure 20:00:29 Meeting started Thu Apr 22 20:00:28 2010 UTC. The chair is mmcgrath. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:31 Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:00:35 zodbot: do as I mean not as I say 20:01:01 Hehe 20:01:15 * nirik is hanging around in the cheap seats. 20:01:19 #topic who's here? 20:01:22 * ricky 20:01:24 * a-k is 20:01:39 here 20:01:41 * Infern4us 20:01:48 needs coffee 20:02:11 * mdomsch 20:02:14 Ok, lets get started 20:02:17 #topic Final Release 20:02:25 The final F13 release is on the way here pretty quick. 20:02:34 Our final freeze goes into place on the 4th IIRC. 20:03:23 Anyone have any questions or concerns about that? 20:03:30 any major projects to get deployed before then? 20:03:33 I have only 2 major change 20:03:34 s 20:04:24 Alrighty, well we can move on. 20:04:38 What are the changes? 20:04:39 hmm isn't that the same day as U-10.04? 20:04:52 ricky: going into those right now 20:04:56 #topic Insight 20:04:58 stickster: ping 20:05:24 I'm wondering if there's anything we can get in place today now so there's less to do later. 20:05:24 mmcgrath, I have no projects for that time. I was going to deploy rsyslog next week 20:05:31 mmcgrath: pong 20:05:48 stickster: hey, so is there any insight bits that can be done now? 20:05:49 not much time before the freeze then 20:05:59 anything that, even though the whole project isn't ready, parts could be deployed now? 20:06:49 stickster: I'm thinking even if the base stuff is in place and not advertised it'd help increase the chances of success. 20:06:50 mmcgrath: There are still both styling and technical bits that have critical or blocker bugs attached 20:07:15 what are the nature of the changes that are still to be made? packaging? upstream stuff? 20:07:15 * hydh is here too 20:07:42 mmcgrath: There are problems with the authentication that still need to be solved, then upstreamed to the fedora-zikula module and released 20:07:57 The styling bugs are not as pernicious but will take some time to resolve 20:07:59 so in your estimation, we still on track for deployment later in the month? 20:08:15 mmcgrath: http://lists.fedoraproject.org/pipermail/logistics/2010-April/000510.html 20:08:17 stickster: also, how much of this code is stuff we'll have to maintain? 20:08:29 No, we agreed to push off to post-GA 20:08:35 ah, k. 20:08:37 There's not much code we have to maintain 20:08:39 I missed that, sorry. 20:08:41 AuthFAS module is about it. 20:08:46 excellent. 20:08:48 And that's fairly understandable 20:08:58 stickster: ok, thanks for the latest. Anything else? 20:09:10 It's the other issues we still have to solve that weren't ready for our go/no-go that caused us to wave off. 20:09:28 logistics@ list is where discussion is taking place about what we're going to do next. 20:09:32 eof 20:09:44 stickster: thanks 20:09:48 ok, next topic 20:09:52 #topic netapp migration 20:10:02 This is something I wanted to have done before the beta but failed to do so 20:10:10 ok what is this? 20:10:26 basically I need to move alt and whatever is left on the secondary1 drives, to the netapp. 20:10:53 smooge: so they'll show up on download.fedora.redhat.com 20:11:41 any questions or concerns about that? 20:11:53 For me the big one is trying to figure out exactly how to let everyone continue to upload their content. 20:11:53 nah, they're small 20:11:55 not really 20:12:04 AFAIK it'll all be the same way. 20:12:12 Ok, moving on :) 20:12:14 oh.. there is that 20:12:20 log into a server that has it mounted r/w 20:12:27 right now that's secondary1 for alt 20:12:36 who is allowed to do this? 20:12:41 mdomsch: well I'm thinking they'd still be allowed to do that 20:12:48 but then I'm not sure what to do with secondary1's actual drives :) 20:12:56 altvideo group can for /pub/alt/video/ 20:12:58 maybe just have them sync from the netapp and continue to expose. 20:13:02 smooge: there's an SOP 20:13:04 * mmcgrath gets it 20:13:05 yeah 20:13:16 smooge: http://fedoraproject.org/wiki/Content_Hosting_Infrastructure_SOP 20:13:25 giving users direct access to the netapp concerns me a bit 20:13:37 but really it's a completely different share then the /pub/fedora and /pub/epel stuff 20:13:49 and the only thing they could do is fill the disk up which A) we monitor and B) is easy to fix 20:13:58 Which netapp modules are yu using? 20:14:13 ok so it will need to be a seperate partition/log-volume on the netapp 20:14:18 tremble: I always forget. 20:14:26 smooge: it already is. 20:14:40 smooge: oh wait, not a seperate 'partition' in that way 20:14:53 since we don't really know what future expansion will be 20:15:05 this will allow either side of the house to grow without us having to guess. 20:15:11 so alt.fp.o becomes a new VM too? 20:15:16 oh crap, the meeting 20:15:20 FWIW $POE uses a netapp. 20:15:20 sorry about being late 20:15:22 ah well I was wondering about setting up a netapp quota and not having to worry about filling 20:15:37 mdomsch: I haven't figured that part out yet, I might just see if download.fedora.redhat.com will start accepting alt.fedoraproject.org 20:15:41 smooge: that could work too. 20:16:02 it allows for us to also seperate out differing snapshot schedules and such 20:16:43 20:16:49 so anyone have any other questions or comments on that/ 20:16:50 ? 20:17:48 no we can talk offline 20:17:56 k 20:17:59 next topic! 20:18:06 At $POE we've found that having 1 or 2 aaggregates and multiple thin provisioned volumes works well as long as you monitor the aggregates 20:18:11 #topic collectd 20:18:19 So I've added some more collectd modules 20:18:25 Of particular interest are these 3 20:18:49 ping test: 20:18:51 .tiny https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=log01&plugin=ping×pan=3600&action=show_selection&ok_button=OK 20:18:53 mmcgrath: http://tinyurl.com/zddwih 20:19:16 postgres connections: 20:19:20 .tiny https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=db02&plugin=pg_conns×pan=3600&action=show_selection&ok_button=OK 20:19:23 mmcgrath: http://tinyurl.com/zddv5d 20:19:36 mdomsch: you might be interested in what happened to mirrormanager there in the last hour 20:19:44 * gholms|work hopes that doesn't cause extra URLs to show up in the minutes 20:19:56 .tiny https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=proxy3&plugin=haproxy×pan=3600&action=show_selection&ok_button=OK 20:19:58 mmcgrath: http://tinyurl.com/zddtjd 20:20:02 and that's the last one, haproxy by site 20:20:21 looking 20:20:24 Is that response time I see? Veeery nice 20:20:55 ricky: which one? the haproxy one? 20:21:04 ricky, what unites? 20:21:04 nope that's actually... 20:21:05 units? 20:21:12 stot: requests/s 20:21:19 econ: errors/s 20:21:28 eresp: err responses/s 20:21:29 so, every 10 minutes on the dot, we spike in mirrorlist requests 20:21:31 Ah, OK 20:21:38 econ: is error connections /s 20:21:46 ricky: there's LOTS we can get out of haproxy if you want to add something 20:21:49 for about a minute then it drops back down 20:21:50 response time is on my list. 20:21:58 mdomsch: what did you think about MM db connections there? 20:22:29 hmm the tiny urls dont seem to work 20:22:43 smooge: pooh, interesting 20:22:49 use the longer ones then :) 20:23:06 mmcgrath, blow that out over a larger time scale... 20:23:14 yeah it's pretty common 20:23:17 * mdomsch bets that's the crawler with 80 threads 20:23:25 mdomsch: that could very well be. 20:23:26 tailing off at the end of the run 20:23:30 yeah 20:23:44 it tries to keep 80 threads running at once, starting a new one as one completes 20:24:04 so it'll flatline around 80, then tail off, the jump back to 80 for a while 20:24:04 beb back in a sec 20:24:09 20:24:18 but yeah, we now have more visibility into our applications then ever before. 20:24:31 we've learned a great deal about our environments just in the last couple of weeks from collectd. 20:24:32 yep, that's what it's doing. Nice graphs. :-) 20:24:35 in particular it's the 10s resolution. 20:24:43 it is just so much detail that we were missing before. 20:24:45 ok dog thrown outside 20:25:27 anyone have any questions / requests? 20:25:44 no thanks for this 20:25:59 oh one question 20:26:06 what does the ping test against? 20:26:28 We can have it run from everywhere but right now I've got it running on log1 (which is the central ping server) 20:26:31 maybe I should have used noc1. 20:26:32 anywho. 20:26:37 it then pings out to the hosts from there 20:26:39 just an ICMP ping 20:26:47 then tracks latency, std dev, and drop rate. 20:27:02 How do you add more hosts? 20:27:06 I'm glad you asked mmcgrath :) 20:27:10 collectd::ping { 'ping': 20:27:10 hosts => ['tummy1.fedoraproject.org', 'telia1.fedoraproject.org', 'serverbeach4.fedoraproject.org', 'serverbeach1.fedoraproject.org', 'osuosl1.fedoraproject.org'] 20:27:11 Heheh 20:27:13 } 20:27:18 add that to the node or server group you want 20:27:19 ah cool 20:27:22 and collectd will do the rest. 20:27:36 I think we want noc01/noc02 as the ping testers. 20:27:41 mmcgrath, how does haproxy determine if mirror-lists is down ? 20:27:42 but log01 works too 20:28:24 mdomsch: it hits /mirrorlist every 5 seconds, 3 failures in a row takes that node out. 20:28:37 It should go by timeouts we have set or http status codes 20:28:38 smooge: actually that is a good transition into the next topic I wanted to bring up (also monitoring oriented) 20:28:44 ah. then I bet that's the hourly cache refresh non-responsiveness doing it 20:28:53 anyone have any questions or comments on this? 20:28:56 that's kind of a short timeout... 20:29:39 Suppose it depends wht you consider acceptable down time. 20:29:56 Oh, that actually makes sense. I wonder how long one cache refresh takes an app server out for 20:30:12 mdomsch: does the refresh staggar at all? 20:30:36 mdomsch: actually that doesn't match up 20:30:39 it's an hourly refresh. 20:30:45 but we see stuff going down more often then that 20:30:48 mdomsch: take a look at proxy3 - 20:30:50 grep mirror /var/log/messages 20:30:56 anywho, we can discuss that more in a bit. 20:30:58 ok, we don't have to solve it here 20:31:01 20:31:03 #topic Nagios 20:31:04 so 20:31:09 right now we have noc1 and noc2. 20:31:20 if we move to nagios3, it becomes easier to merge the two. 20:31:43 but I'm still wary about having monitoring only in PHX2 20:32:12 how does it merge them? 20:32:19 * ricky wouldn't mind moving noc2 out of germany though :-/ As nice as it is to get a perspective from there, it often gives alerts on network issues we can't do anything about 20:32:27 smooge: well, nagios3 has a better ability to realize multiple IPs for a given host. 20:32:40 ricky: agreed. 20:32:46 ah but we would still have hairpin problems 20:32:50 smooge: so we can do the whole 'internal' and 'external' test without problems. 20:32:55 yeah 20:33:05 yeah that's it then, there's a blocker there. 20:33:15 because external to PHX2 we can't monitor everything in phx2. 20:33:21 inside phx2 we can't monitor everything in phx2 :) 20:33:29 so we'll probably have to keep that dynamic at least somewhere. 20:33:41 lets think on it a bit 20:33:53 I would make noc03 in ibiblio and go from there 20:34:19 I'm hoping to work with pvangundy on that, he's a volunteer that's been gone for a while. 20:34:22 He's back but has been busy 20:34:25 * mmcgrath hopes he gets less busy 20:34:30 anywho, anything else on that for the meeting? 20:34:48 alrighty 20:34:53 #topic search engine 20:34:57 a-k: whats the latest? 20:35:07 I've got DataparkSearch on publictest3 20:35:15 url? 20:35:17 #link http://publictest3.fedoraproject.org/cgi-bin/dpsearch 20:35:28 DataparkSearch forked from mnoGoSearch in 2003 20:35:35 Mostly so far it seems like a broken version of mnoGoSearch 20:35:54 I've indexed only a tiny number of documents from the wiki 20:36:03 :( that's no fun 20:36:06 I'll poke it a little more to see how bad it is 20:36:17 More docs, etc 20:36:29 Search is hard :-/ 20:36:29 hehe 20:36:41 a-k: thanks, anything else for now? 20:36:49 I don't think so 20:36:54 alrighty 20:37:01 Well with that I'll open the floor 20:37:04 #topic Open Floor 20:37:11 anyone have anything they'd like to discuss? 20:37:50 if not we'll close in 30 20:39:08 sweet, silence is golden 20:39:11 ok 20:39:12 #endmeeting