20:00:35 <mmcgrath> #startmeeting Infrastructure
20:00:36 <zodbot> Meeting started Thu Mar 18 20:00:35 2010 UTC.  The chair is mmcgrath. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:38 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
20:00:39 <mmcgrath> who's here?
20:00:42 * ricky 
20:01:32 * skvidal is here
20:01:38 * hydh is here
20:01:41 * nirik is lurking along in the back
20:01:44 <mmcgrath> Ok, lets get started
20:01:50 <mmcgrath> #topic Infrastructure -- Freeze next week
20:01:57 <mmcgrath> Anyone have any major changes going in over the next couple of days?
20:01:59 <mmcgrath> lmacken: ^^
20:02:29 <mmcgrath> abadger1999: any major changes going in over the next couple of days?
20:03:00 <abadger1999> mmcgrath: I'm going to push an update to the pkgdb with some bugfixes and a switch to memcached for caching status codes.
20:03:06 <lmacken> nothing too major.. I'd like to upgrade fedoracommunity to the version that is in staging... however, the recent pkgdb upgrade broke the pkgdbconnector in fedoracommunity, so that will require some fixes... so we'll see.
20:03:33 <lmacken> I may push a bodhi bugfix update out as well
20:03:36 <abadger1999> mmcgrath: It should make things more stable but using memcached is new so it might need some tweks.
20:03:47 <mmcgrath> that's cool, as long as they're in before next tuesday I don't see any problems with those updates
20:04:12 <abadger1999> <nod>
20:04:12 * dgilmore is present
20:04:38 <abadger1999> Oh yeah -- planning on finishing the memcached stuff today and deploying tomorrow.
20:04:47 <mmcgrath> excellent
20:04:52 <mmcgrath> let me know if you have need anything
20:05:01 <mmcgrath> anyone have anything else on that?
20:05:49 <mmcgrath> ok
20:06:00 <mmcgrath> #topic Search Engines
20:06:04 <mmcgrath> a-k: want to take it?
20:06:11 <a-k> Not much to say this week
20:06:28 <a-k> In fact, status is the same as last week
20:06:39 <mmcgrath> a-k: k, that's easy
20:06:44 <mmcgrath> #topic Collectd
20:06:53 <mmcgrath> So as some of you noticed i'm starting to look at collectd for trending.
20:06:57 <mmcgrath> anyone here ever used it?
20:07:45 <mmcgrath> Guess not
20:07:46 * skvidal has - but you already knew that - but I used it like 3yrs ago - so it barely counts
20:07:57 <mmcgrath> well, we've got some performance issues I need to figure out.
20:08:10 <mmcgrath> I'm just not exactly sure what to expect yet so further experimentation is needed.
20:08:18 <mmcgrath> obviously getting 10 seconds of granularity is very helpful.
20:08:31 <mmcgrath> infact it's so helpful we pieced together a long running issue just last night from looking at the graphs.
20:08:36 <mmcgrath> but it's also expensive.
20:08:42 <mmcgrath> very expensive as it turns out :)
20:08:49 <lmacken> mmcgrath: I've been using it for the past couple of days.. pretty slick, but very slow
20:08:54 <mmcgrath> so anywah, more looking at options.
20:08:56 <dgilmore> mmcgrath: thats why cacti polls at 5 minutes
20:09:02 <dgilmore> by defualt
20:09:29 <mmcgrath> Specifically this is what we're talking about - http://collectd.org/wiki/index.php/Inside_the_RRDtool_plugin
20:09:37 <mmcgrath> rrdtool is extremely expensive on disk IO
20:09:40 <mmcgrath> dgilmore: no kidding :)
20:09:58 <mmcgrath> lmacken: yeah, it's been interesting, the plugins that exist are nice, and writing custom queries is very simple.
20:10:06 <mmcgrath> and you can log to things that aren't rrdtool but what fun is that?  :)
20:10:31 <mmcgrath> Anyone have any questions or comments on that?
20:11:01 <mmcgrath> ok
20:11:05 <mmcgrath> #topic Monitoring fixup
20:11:21 <mmcgrath> So.  if collectd works out, we can go back to getting our nagios install in order.
20:11:25 <mmcgrath> I looked closely at opennms
20:11:31 <mmcgrath> I really liked it.
20:11:43 <mmcgrath> but packaging it in Fedora under our rules, would be a massive undertaking.
20:11:51 <mmcgrath> like a potentially multi-year undertaking.
20:11:58 <mmcgrath> and there's no way I can justify doing that right now.
20:12:11 <mmcgrath> it relies on like ever java lib ever written.
20:12:17 <wwoods> hrm
20:12:25 <wwoods> wonder if there's overlap with gwt
20:12:32 <dgilmore> mmcgrath: its a mess to try an package
20:12:37 <mmcgrath> wwoods: there might be
20:12:51 <mmcgrath> I just know I watched a build and watched maven download a whole boatload of http://'s
20:13:02 * lmacken has been playing with alienvault in a local vm... but hasn't really configured it properly yet
20:13:02 <wwoods> because we (QA) are going to need to package GWT, so we're trying to put together a big java packaging FAD or something
20:13:10 <mmcgrath> So it's on the back burner if nothing else.
20:13:13 <lmacken> it looks pretty awesome though, as alienvault has plugins for everything out there
20:13:16 <mmcgrath> wwoods: yeah I've seen that dep list.
20:13:24 <wwoods> mmcgrath: yeah. hair-curling.
20:13:25 <mmcgrath> lmacken: yeah alienvault is pretty slick
20:13:41 <mmcgrath> lmacken: oh, speaking of collectd, did you see the web interface has a json.cgi?
20:14:01 <lmacken> mmcgrath: oh, I didn't see that
20:14:50 <lmacken> hm, when you query may machines with collectd, the output doesn't make it easy to see which graphs are for which host :(
20:15:15 <mmcgrath> .tiny wget -qO- 'https://admin.fedoraproject.org/collectd/bin/json.cgi?hostname=app01&plugin=apache&timespan=86400&action=show_selection&ok_button=OK'
20:15:19 <zodbot> mmcgrath: Error: 'wget' is not a valid url.
20:15:21 <mmcgrath> lmacken: check that out ^^
20:15:28 * mmcgrath shouldn't have tinied a wget :)
20:15:44 <mmcgrath> anyway, that kind of works.
20:16:01 <mmcgrath> I need to figure out how to actually query it
20:16:04 <mmcgrath> that just seems to be a dump
20:16:11 <lmacken> mmcgrath: high level domain not set properly?  the dump links to collect.noris.net/
20:16:24 <mmcgrath> lmacken: hehe, I guess not
20:16:28 <lmacken> but yeah, that is pretty slick
20:16:34 <mmcgrath> this is literally the first time I tried to access it, there must be a config option somewhere.
20:16:53 <mmcgrath> but yeah, hydh has been working on getting the nagios setup back in place.
20:16:58 <mmcgrath> we'll keep working on it tomorrorw.
20:17:10 <mmcgrath> anyone have any questions or comments on that?
20:17:13 <mmcgrath> hydh: anything?
20:17:22 <hydh> nope
20:17:30 <hydh> not yet :P
20:17:38 <mmcgrath> k
20:17:41 <mmcgrath> so that's all I have on that
20:17:43 <mmcgrath> #topic Func
20:17:49 <mmcgrath> skvidal: want to talk about what you've been working on?
20:17:54 <skvidal> sure
20:18:05 <skvidal> okay so I've patched func to use puppet certs and host lists
20:18:18 <skvidal> and we've gotten the timeouts to no longer vom on things
20:18:37 <skvidal> and then in the last couple of days mmcgrath and I have been playing "fix all the broken frelling nodes"
20:18:49 <skvidal> there are a lot of boxes and nodes in puppet which are
20:18:54 <skvidal> 1. not doing what they claim to be
20:19:00 <skvidal> 2. not the name they say they are
20:19:09 <mmcgrath> oodles of fun
20:19:12 <skvidal> and then there are little bizarre niggling details we've found
20:19:20 * lmacken is looking forward to being able to `func "*" ping` again
20:19:31 <skvidal> lmacken: you can run: func '*' call test ping right now
20:19:44 <lmacken> ah, nice
20:19:49 <mmcgrath> skvidal: do you think we'll have an easy script in place after this to nag-alert when something gets out of sync again?
20:20:00 <skvidal> mmcgrath: to a certain extent
20:20:12 <skvidal> my goal is to collect a lot of status info
20:20:13 <skvidal> via func
20:20:15 <skvidal> including
20:20:27 <skvidal> all the pkg info - what's installed, what's broken, repos, updates, orphans, etc
20:20:34 <skvidal> all the static host info
20:20:42 <skvidal> and maybe some process info
20:21:06 <skvidal> the nice thing that this provides
20:21:11 <skvidal> is it lets us know if we have a host in puppet
20:21:15 <skvidal> that's not checking in
20:21:17 <skvidal> or not able to
20:21:23 <skvidal> b/c while puppet just waits for something to talk to it
20:21:31 <mmcgrath> yeah
20:21:33 <skvidal> func talks OUT to the systtems
20:21:36 <skvidal> so if it doesn't work
20:21:36 <lmacken> excellent
20:21:40 <skvidal> we know right away
20:21:52 <skvidal> I've written some new scripts for func I'll be checking in uipstream before long
20:21:52 <mmcgrath> skvidal: so by using puppet, func has no local state information of anykind anymore right?
20:22:04 <skvidal> mmcgrath: there's one file I added that is 'local'
20:22:08 <skvidal> which is a downed_hosts list
20:22:16 <mmcgrath> ah
20:22:22 <skvidal> that's the list of hosts which we know are downed and should be ignored by func
20:22:27 <skvidal> there's no place to store that in puppet
20:22:36 <skvidal> right now the only hosts in there
20:22:42 <skvidal> are ones we know are named wrong
20:22:52 <skvidal> and func outputs that list
20:22:55 <skvidal> EVERYTIME you run it
20:23:02 <skvidal> so it's going to be hard to ignore
20:23:07 <mmcgrath> yeah
20:23:17 <skvidal> but there is no state info otherwise about the func hosts
20:23:31 <skvidal> my plans are: fix up func-yum more - for tracking info
20:23:38 <skvidal> work on the func yum and rpms modules
20:23:49 <skvidal> add more tooling to let us find out more on the hosts, more properly
20:24:09 <skvidal> beyond that
20:24:13 <smooge> here
20:24:23 <skvidal> minion to minion traffic
20:24:33 <skvidal> so our minions can emit notices to other minions using func
20:24:33 <mmcgrath> <nod>
20:24:37 <skvidal> but that's down the line a bit
20:24:41 <skvidal> right now -ENOTIME
20:24:46 <mmcgrath> yeah yeah
20:24:50 <skvidal> if someone is interested in getting involved
20:24:51 <skvidal> ping me
20:24:54 <mmcgrath> skvidal: well thanks as always.
20:24:56 <skvidal> welcome
20:24:56 <mmcgrath> I'll keep doing updates
20:25:16 <skvidal> mmcgrath: and the func-yum script will be getting some fixes as I go and test
20:25:27 <mmcgrath> yeah
20:25:32 <smooge> skvidal, cool
20:25:33 <mmcgrath> so far it's worked well
20:25:43 <skvidal> mmcgrath: needs to report errors better
20:25:54 <skvidal> mmcgrath: and the searches could be shinier
20:26:08 <mmcgrath> skvidal: my next feature request would be some single character indicator of if the update succeeded or not.  It's a little annoying to go through the files :)
20:26:17 <mmcgrath> still way better then logging in and doing it
20:26:25 <skvidal> mmcgrath: right
20:26:31 <skvidal> mmcgrath: I agree - the error reporting
20:26:51 <skvidal> and I have the process for that
20:26:55 <skvidal> but I've not tested it for crap yet :)
20:26:58 <mmcgrath> :-D
20:27:03 <mmcgrath> Ok, well anyone have anything else on that?
20:28:21 <mmcgrath> alright
20:28:34 <mmcgrath> #topic Open Floor
20:28:40 <mmcgrath> anyone have anything they'd like to work in?
20:28:42 <mmcgrath> err on
20:28:46 <mmcgrath> or talk about?
20:29:46 <mmcgrath> alrighty, with that we'll close the meeting in 30
20:30:51 <mmcgrath> #endmeeting