20:00:19 #startmeeting Infrastructure 20:00:20 Meeting started Thu Mar 25 20:00:19 2010 UTC. The chair is mmcgrath. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:21 Who's here? 20:00:22 * skvidal is here 20:00:22 Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:00:26 * lmacken 20:00:45 * a-k is here 20:02:19 #topic Fedora 13 beta 20:02:21 hah 20:02:28 Lets get started 20:02:38 https://fedorahosted.org/fedora-infrastructure/report/9 20:03:03 .ticket 2058 20:03:04 mmcgrath: #2058 (Verify Mirror Space) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2058 20:03:05 I'll get this one 20:03:20 .ticket 2059 20:03:24 mmcgrath: #2059 (Release Day ticket) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2059 20:03:27 This one's just a tracker ticket. I'll take it too 20:03:38 .ticket 2060 20:03:40 mmcgrath: #2060 (Verify releng permissions) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2060 20:03:42 smooge: want to get that one again? 20:04:52 we'll come back to that. 20:04:57 .tiny 2061 20:04:58 mmcgrath: Error: '2061' is not a valid url. 20:05:04 .ticket 2061 20:05:06 sorry :/ 20:05:07 mmcgrath: #2061 (MM redirects) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2061 20:05:17 mdomsch usually gets this one (I believe it's automated now and just requires verification) 20:05:20 .ticket 2062 20:05:21 mmcgrath: #2062 (Infrastructure Change Freeze) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2062 20:05:24 I'll get that, we are frozen. 20:05:31 morning 20:05:39 sorry got stuck on phone 20:05:43 smooge: hey, want to get 2060? 20:05:49 yes 20:05:55 k 20:05:57 and the last ticket 20:06:05 .ticket 2063 doesn't need to be done until after the launch 20:06:07 could I get 2058 .. I find them related 20:06:08 mmcgrath: Error: "2063 doesn't need to be done until after the launch" is not a valid integer. 20:06:27 hehheeh 20:06:43 zodbot: you are testing me! 20:06:51 ok, anyone have any questions or comments related to the release? 20:06:53 Oxf13: you around? 20:07:03 I am 20:07:11 what are our odds of slipping at the moment? 20:07:24 I'd put it at 50% chance 20:07:41 There is one blocker we're worried about, but we have a patch in hand, it just needs testing, then I can make the RC 20:07:44 mmcgrath: so - when it's a 50% chance of rain - you carry an umbrella :) 20:07:48 cool. 20:07:50 we have a compressed amount of time to test the RC 20:08:07 and not really any time to fix anything that's wrong with the RC and validate a second RC before the go / no go time 20:08:09 Oxf13: I'll work with you later today or tomorrow to verify mirror space, we expecting this to be the same size(ish) as the alpha? 20:08:14 yes 20:08:36 k, sounds good. 20:08:43 If no one has anything else, I'll move on? 20:08:54 * smooge remembers the days of having 8 or 9 RC's 20:09:07 nothing else 20:09:14 #topic func updates 20:09:30 So after some coding and some testing, the func updates before the freeze went pretty well I thought. 20:09:45 does anyone else want to work on that project? 20:09:47 still a few kinks to work out but it was much easier then our current method and required much less attention. 20:09:57 skvidal: as in you're done with it or want some help? 20:10:03 want some help 20:10:21 I got asked to work on something else this week 20:10:22 skvidal: I have some cycles during the freeze. though I can't promise I won't make things worse :) 20:10:30 and that's been taking my focus 20:11:07 so it's not dropped 20:11:14 skvidal: well I'm sure I'll be pinging you soon(ish) 20:11:17 but I won't be able to spend as much time on it until I get the mock vm stuff out 20:11:20 20:11:25 anyone else have questions or comments on that? 20:11:41 the func+yum thing is lightweight 20:11:46 and entry-level easy to work on 20:11:58 http://fedorapeople.org/gitweb/skvidal/func-yum.git 20:12:02 lots of easy wins 20:12:12 skvidal: thanks 20:12:19 Ok, next topic 20:12:21 #topic Collectd 20:12:31 So we've been using collectd for a bit now. 20:12:35 what do people think? 20:12:39 skvidal: Thanks for getting func working well again. 20:12:51 abadger1999: so much to do to make things 'well' 20:12:52 skvidal, I would like to help 20:12:53 but it is unbroken 20:13:01 :-) 20:13:02 my python is broken, but I really want to help 20:13:26 sorry meant help on func 20:13:33 collectd I have found useful 20:13:46 mmcgrath: It's helped us fix something already. It's generally good. 20:13:51 looking at app04 I can see where it is heavily running into some issues 20:13:52 smooge: me too, we've already found several problems just by having it in place. 20:14:05 what did collectd help you fix? 20:14:26 skvidal: it helpped us find the outage blips as being realted to db2. 20:14:31 ah 20:14:32 cool 20:15:02 other tools could have found it, but just the way we have it setup right now (every 10s) allowed us to see that load spike in such a short window was related to disk, and even more then that disk writes. 20:15:06 which got us looking. 20:15:22 and while it's not totally fixed I do think we're in better shape. I think we just need to adjust our backup system a bit. 20:15:31 but that'll be a post-freeze thing. 20:15:35 so here's the only got'cha with collectd. 20:15:45 its a massive suck? 20:15:52 https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=log01&plugin=disk×pan=604800&action=show_selection&ok_button=OK 20:15:57 heh 20:16:05 it's the disk IO required to do rrd files. 20:16:13 there's lots of tricks we can (and do) use to fix that. 20:16:18 but as we grow, it's something to watch. 20:16:48 you'll notice that on the 19th I figured out that automatically polling every tcp port in use and recording that info was too expensive for us :) 20:16:51 duh 20:16:58 but yeah, something to watch. 20:17:14 there's also non-rrdtool collection methods we can use if we really need to that would also be useful 20:17:27 anywho, anyone have any questions on that? 20:17:43 I've been slowly adding more useful stuff 20:17:47 like - 20:17:49 .tiny https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=db1&plugin=mysql×pan=86400&action=show_selection&ok_button=OK 20:17:51 mmcgrath: http://tinyurl.com/ye6nrvl 20:18:21 anywho, no more questions there so that's good. 20:18:28 #topic Monitoring 20:18:30 mmcgrath, we did rrdtools in a ram drive 20:18:47 smooge: yeah that's basically what some of the tunables in the rrdtool plugin do. 20:18:59 Ok, so we're basically back to nagios and collectd. 20:19:15 yeah.. we set it up like /var/spool/mqueue/t and had a 2 GB partition for it .. 20:19:29 hydh has been working on some stuff but I know he's been busy 20:19:33 I want to say our community help on nagios has been cool 20:19:36 and great 20:19:38 I'd like to get some basic event handlers finalized as well as proper deps in place. 20:19:42 smooge: indeed 20:20:13 anyone have any questions about what we're up to in nagios and where we're headed? 20:20:38 alrighty, well with that 20:20:41 #topic Open Floor 20:20:45 anyone have anything they'd like to discuss? 20:20:50 a-k: anything new on search engines? 20:21:01 I'm still looking at mnoGoSearch with PostgreSQL 20:21:14 I haven't had a chance to try crawling with it yet, but if it goes well I'll put it in pub test next week 20:21:25 BTW is there a preference for MySQL vs PostgreSQL? I know/think we have them both around.... 20:21:38 a-k: no preference 20:21:57 I'm okay with that. That's about it for now. 20:22:24 anyone have anything else they'd like to discuss/ 20:22:34 Random question? 20:22:49 Did you folks notice the collectd server-side daemon putting a lot of load on the machine? I ask because that's what I experienced at $dayjob. 20:23:15 gholms: yeah, we fixed it with the suggestions here... 20:23:16 * mmcgrath gets link 20:23:33 http://collectd.org/wiki/index.php/Inside_the_RRDtool_plugin 20:24:01 Ooh, that looks useful. Thanks. 20:24:09 gholms: yup yup 20:24:11 1) going to work on log reviews over freeze. Now that we have over 50% free logs I was going to see what I could get out of it daily. 20:24:36 I think someone else was working on this earlier so will hook up with them and see where it goes 20:24:43 smooge: cool 20:24:48 yeah someone was but I don't know the status 20:25:11 then I am pretty much building my home/slicehost network to 'clone' F-I so I can test stuff here a bit better. 20:25:37 my goal will be to see how far I can take epylog before it screams in terror at our data 20:25:50 smooge: sounds good, let us know if you need anything 20:25:53 well me 20:25:54 :) 20:26:18 ok, well with that I'll close the meeting in 30 20:26:44 15 20:26:56 #endmeeting