20:00:35 #startmeeting Infrastructure 20:00:36 Meeting started Thu Mar 18 20:00:35 2010 UTC. The chair is mmcgrath. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:38 Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:00:39 who's here? 20:00:42 * ricky 20:01:32 * skvidal is here 20:01:38 * hydh is here 20:01:41 * nirik is lurking along in the back 20:01:44 Ok, lets get started 20:01:50 #topic Infrastructure -- Freeze next week 20:01:57 Anyone have any major changes going in over the next couple of days? 20:01:59 lmacken: ^^ 20:02:29 abadger1999: any major changes going in over the next couple of days? 20:03:00 mmcgrath: I'm going to push an update to the pkgdb with some bugfixes and a switch to memcached for caching status codes. 20:03:06 nothing too major.. I'd like to upgrade fedoracommunity to the version that is in staging... however, the recent pkgdb upgrade broke the pkgdbconnector in fedoracommunity, so that will require some fixes... so we'll see. 20:03:33 I may push a bodhi bugfix update out as well 20:03:36 mmcgrath: It should make things more stable but using memcached is new so it might need some tweks. 20:03:47 that's cool, as long as they're in before next tuesday I don't see any problems with those updates 20:04:12 20:04:12 * dgilmore is present 20:04:38 Oh yeah -- planning on finishing the memcached stuff today and deploying tomorrow. 20:04:47 excellent 20:04:52 let me know if you have need anything 20:05:01 anyone have anything else on that? 20:05:49 ok 20:06:00 #topic Search Engines 20:06:04 a-k: want to take it? 20:06:11 Not much to say this week 20:06:28 In fact, status is the same as last week 20:06:39 a-k: k, that's easy 20:06:44 #topic Collectd 20:06:53 So as some of you noticed i'm starting to look at collectd for trending. 20:06:57 anyone here ever used it? 20:07:45 Guess not 20:07:46 * skvidal has - but you already knew that - but I used it like 3yrs ago - so it barely counts 20:07:57 well, we've got some performance issues I need to figure out. 20:08:10 I'm just not exactly sure what to expect yet so further experimentation is needed. 20:08:18 obviously getting 10 seconds of granularity is very helpful. 20:08:31 infact it's so helpful we pieced together a long running issue just last night from looking at the graphs. 20:08:36 but it's also expensive. 20:08:42 very expensive as it turns out :) 20:08:49 mmcgrath: I've been using it for the past couple of days.. pretty slick, but very slow 20:08:54 so anywah, more looking at options. 20:08:56 mmcgrath: thats why cacti polls at 5 minutes 20:09:02 by defualt 20:09:29 Specifically this is what we're talking about - http://collectd.org/wiki/index.php/Inside_the_RRDtool_plugin 20:09:37 rrdtool is extremely expensive on disk IO 20:09:40 dgilmore: no kidding :) 20:09:58 lmacken: yeah, it's been interesting, the plugins that exist are nice, and writing custom queries is very simple. 20:10:06 and you can log to things that aren't rrdtool but what fun is that? :) 20:10:31 Anyone have any questions or comments on that? 20:11:01 ok 20:11:05 #topic Monitoring fixup 20:11:21 So. if collectd works out, we can go back to getting our nagios install in order. 20:11:25 I looked closely at opennms 20:11:31 I really liked it. 20:11:43 but packaging it in Fedora under our rules, would be a massive undertaking. 20:11:51 like a potentially multi-year undertaking. 20:11:58 and there's no way I can justify doing that right now. 20:12:11 it relies on like ever java lib ever written. 20:12:17 hrm 20:12:25 wonder if there's overlap with gwt 20:12:32 mmcgrath: its a mess to try an package 20:12:37 wwoods: there might be 20:12:51 I just know I watched a build and watched maven download a whole boatload of http://'s 20:13:02 * lmacken has been playing with alienvault in a local vm... but hasn't really configured it properly yet 20:13:02 because we (QA) are going to need to package GWT, so we're trying to put together a big java packaging FAD or something 20:13:10 So it's on the back burner if nothing else. 20:13:13 it looks pretty awesome though, as alienvault has plugins for everything out there 20:13:16 wwoods: yeah I've seen that dep list. 20:13:24 mmcgrath: yeah. hair-curling. 20:13:25 lmacken: yeah alienvault is pretty slick 20:13:41 lmacken: oh, speaking of collectd, did you see the web interface has a json.cgi? 20:14:01 mmcgrath: oh, I didn't see that 20:14:50 hm, when you query may machines with collectd, the output doesn't make it easy to see which graphs are for which host :( 20:15:15 .tiny wget -qO- 'https://admin.fedoraproject.org/collectd/bin/json.cgi?hostname=app01&plugin=apache×pan=86400&action=show_selection&ok_button=OK' 20:15:19 mmcgrath: Error: 'wget' is not a valid url. 20:15:21 lmacken: check that out ^^ 20:15:28 * mmcgrath shouldn't have tinied a wget :) 20:15:44 anyway, that kind of works. 20:16:01 I need to figure out how to actually query it 20:16:04 that just seems to be a dump 20:16:11 mmcgrath: high level domain not set properly? the dump links to collect.noris.net/ 20:16:24 lmacken: hehe, I guess not 20:16:28 but yeah, that is pretty slick 20:16:34 this is literally the first time I tried to access it, there must be a config option somewhere. 20:16:53 but yeah, hydh has been working on getting the nagios setup back in place. 20:16:58 we'll keep working on it tomorrorw. 20:17:10 anyone have any questions or comments on that? 20:17:13 hydh: anything? 20:17:22 nope 20:17:30 not yet :P 20:17:38 k 20:17:41 so that's all I have on that 20:17:43 #topic Func 20:17:49 skvidal: want to talk about what you've been working on? 20:17:54 sure 20:18:05 okay so I've patched func to use puppet certs and host lists 20:18:18 and we've gotten the timeouts to no longer vom on things 20:18:37 and then in the last couple of days mmcgrath and I have been playing "fix all the broken frelling nodes" 20:18:49 there are a lot of boxes and nodes in puppet which are 20:18:54 1. not doing what they claim to be 20:19:00 2. not the name they say they are 20:19:09 oodles of fun 20:19:12 and then there are little bizarre niggling details we've found 20:19:20 * lmacken is looking forward to being able to `func "*" ping` again 20:19:31 lmacken: you can run: func '*' call test ping right now 20:19:44 ah, nice 20:19:49 skvidal: do you think we'll have an easy script in place after this to nag-alert when something gets out of sync again? 20:20:00 mmcgrath: to a certain extent 20:20:12 my goal is to collect a lot of status info 20:20:13 via func 20:20:15 including 20:20:27 all the pkg info - what's installed, what's broken, repos, updates, orphans, etc 20:20:34 all the static host info 20:20:42 and maybe some process info 20:21:06 the nice thing that this provides 20:21:11 is it lets us know if we have a host in puppet 20:21:15 that's not checking in 20:21:17 or not able to 20:21:23 b/c while puppet just waits for something to talk to it 20:21:31 yeah 20:21:33 func talks OUT to the systtems 20:21:36 so if it doesn't work 20:21:36 excellent 20:21:40 we know right away 20:21:52 I've written some new scripts for func I'll be checking in uipstream before long 20:21:52 skvidal: so by using puppet, func has no local state information of anykind anymore right? 20:22:04 mmcgrath: there's one file I added that is 'local' 20:22:08 which is a downed_hosts list 20:22:16 ah 20:22:22 that's the list of hosts which we know are downed and should be ignored by func 20:22:27 there's no place to store that in puppet 20:22:36 right now the only hosts in there 20:22:42 are ones we know are named wrong 20:22:52 and func outputs that list 20:22:55 EVERYTIME you run it 20:23:02 so it's going to be hard to ignore 20:23:07 yeah 20:23:17 but there is no state info otherwise about the func hosts 20:23:31 my plans are: fix up func-yum more - for tracking info 20:23:38 work on the func yum and rpms modules 20:23:49 add more tooling to let us find out more on the hosts, more properly 20:24:09 beyond that 20:24:13 here 20:24:23 minion to minion traffic 20:24:33 so our minions can emit notices to other minions using func 20:24:33 20:24:37 but that's down the line a bit 20:24:41 right now -ENOTIME 20:24:46 yeah yeah 20:24:50 if someone is interested in getting involved 20:24:51 ping me 20:24:54 skvidal: well thanks as always. 20:24:56 welcome 20:24:56 I'll keep doing updates 20:25:16 mmcgrath: and the func-yum script will be getting some fixes as I go and test 20:25:27 yeah 20:25:32 skvidal, cool 20:25:33 so far it's worked well 20:25:43 mmcgrath: needs to report errors better 20:25:54 mmcgrath: and the searches could be shinier 20:26:08 skvidal: my next feature request would be some single character indicator of if the update succeeded or not. It's a little annoying to go through the files :) 20:26:17 still way better then logging in and doing it 20:26:25 mmcgrath: right 20:26:31 mmcgrath: I agree - the error reporting 20:26:51 and I have the process for that 20:26:55 but I've not tested it for crap yet :) 20:26:58 :-D 20:27:03 Ok, well anyone have anything else on that? 20:28:21 alright 20:28:34 #topic Open Floor 20:28:40 anyone have anything they'd like to work in? 20:28:42 err on 20:28:46 or talk about? 20:29:46 alrighty, with that we'll close the meeting in 30 20:30:51 #endmeeting