20:00:18 #startmeeting Infrastructure 20:00:18 Meeting started Thu Jul 15 20:00:18 2010 UTC. The chair is mmcgrath. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:18 Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:00:21 #meetingname infrastructure 20:00:21 The meeting name has been set to 'infrastructure' 20:00:29 #topic who's here? 20:00:30 * lmacken 20:00:33 * CodeBlock 20:00:36 * Schmidt 20:00:40 * mmcgrath suspects this is going to be a fairly short meeting, lots of members are away 20:00:49 * nirik is lurking as always 20:00:52 here 20:00:53 * sijis is here 20:01:07 * SkynetLabs is here, lurking around 20:01:18 unofficially here 20:01:40 lets get started 20:01:42 * SkynetLabs is unofficially here*, lurking around 20:01:45 #topic Meeting tickets 20:02:01 CodeBlock: what was your nagios ticket? 20:02:10 uh, sec 20:02:11 * mmcgrath isn't seeing it here - 20:02:13 .tiny https://fedorahosted.org/fedora-infrastructure/query?status=new&status=assigned&status=reopened&group=milestone&keywords=~Meeting&order=priority 20:02:13 mmcgrath: http://tinyurl.com/47e37y 20:02:28 2275 20:02:40 .ticket 2275 20:02:41 CodeBlock: #2275 (Upgrade Nagios) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2275 20:03:14 And the status is basically what the ticket says, upon upgrading to RHEL6, we'll deploy nagios 3, right now it's kind of on hold. Our configs appear to work fine in my testing though 20:03:16 * mmcgrath adds the "meeting" keyword 20:03:20 mmcgrath: oh crap, sorry 20:03:36 CodeBlock: I would like to get this going sooner then later. 20:03:42 it seems there's lots of nagios work blocking on it. 20:03:57 perhaps we can build nagios3 for RHEL5, keep nagios in our infra repo then move to the std repos when we upgrade to RHEL6? 20:04:35 mmcgrath: sure, I've not done packaging or such so I'm not sure how hard that'd be, I'd assume the .src.rpm would work from rhel6? 20:04:44 yeah 20:05:01 smooge: can you get a EL-5 nagios 3.0 build going from what is in the EPEL6 repo? 20:05:03 i had an idea, but i'm not officially here 20:05:07 what can I do to help 20:05:20 Ttech: don't let that stop you from speaking up :) 20:05:29 Alright 20:05:38 mmcgrath, I will see... 20:05:45 * smooge waits for Ttech 20:05:53 i'm typing 20:06:12 smooge: I figure it'll be a 'cvs co nagios; cd nagios/EL-6 ; make srpm; koji build --scratch dist-5E nagios-3.0-1el6.src.rpm' type deal. 20:06:28 smooge: if you want to teach me how to do that (packaging and such) I'm willing to learn if you have some free time, otherwise I can just read up and learn later, and you can handle it 20:06:40 if you can get them to not conflict, you could make a nagios3 for epel5. 20:06:44 From what I've heard and seen you have a lot of monitoring and redundancies but they rely on users, lets say a server was being attacked you have a monitor that alerts users but you could install something like monit or such (there are many options) that would allow actions to be taken when something occurs. Such as shut down the httpd if the load is 500. 20:06:57 I'm not sure if you have anything like that but from what I can tell i didn't see anything. 20:07:06 mmcgrath, ah yes.. sorry my "what can I do to help was being typed before you said "can you get a EL-5" I hit return slowly 20:07:17 :) no worries. 20:07:47 Ttech: we're working to get nagios configured to actively repair issues if that's what you're talking about? 20:07:55 Yes that is 20:08:05 I use a seperate system my self personally, but nagios works. :) 20:08:24 nirik: yeah, there a few open bugs requesting a nagios3 pacakge for epel5, if infrastructure is doing it, may as well make it available to everybody 20:08:37 Also do you have log monitoring setup at all? 20:08:42 stahnma: it'll be in the infrastructure repo if nothing else. 20:08:42 * nirik nods. I think others might like it too. 20:08:45 Ttech: not syslog monitoring 20:08:52 we aggregate but not actively monitor. 20:09:12 did anyone ever determine why the current nagios maintainer didn't want to make a nagios3 package? 20:09:21 when I was the maintainer I just didn't feel like it but that was a long time ago :) 20:09:37 it may be hard to get it parallel installable. 20:09:40 I can send you the response I got 20:09:44 mmm so nothing like snort? 20:09:48 splunk 20:09:49 Sorry 20:10:05 mmcgrath: I thought you were still the maintainer ;) so I am out of the loop 20:10:11 sorry i am here 20:10:12 splunk isn't free software. 20:10:12 tobyh: oh, he responded? 20:10:15 stahnma: I hope not :) 20:10:16 He did 20:10:18 .whoowns nagios 20:10:18 mmcgrath: peter 20:10:27 :: whew :: I thought I orphaned that correctly :) 20:10:30 mmcgrath, But there are free alternatives? 20:10:36 Ttech: not that we've seen 20:10:46 A snippet from the email "the main obstacle is that upgrading from 2.x to 20:10:46 3.x would require manual intervention, and this sort of behavior 20:10:46 during upgrades isn't advisable for EPEL except some special cases" 20:11:02 tobyh: you know what's funny about that, our upgrade from 2.x to 3.x required no changes 20:11:06 CodeBlock: right? 20:11:16 mmcgrath: as far as my testing shows, correct 20:11:34 tobyh: that would be for upgrading 'nagios' to nagios-3. I was talking about making a seperate 'nagios3' package. 20:11:42 like we have for python26, and other items. ;) 20:12:19 but yeah, if it still works with older config, the main 'nagios' package should be upgradable. 20:12:33 well, either way for now, lets plan on getting an infrastructure 'nagios-3.0' package in the infrastructure repo. 20:13:08 sounds good, as I said if someone wants to teach me how, I can do it, otherwise, someone can do that part themselves, either way :P 20:13:11 CodeBlock: work with smooge to get that in the infra repo and we'll figure out a time in the next week (or week after) to get it deployed. 20:13:22 smooge: can you teach CodeBlock how to fish? :) 20:13:31 yeah.. but I use dynamite 20:14:07 smooge, is there room for a second in the fishing with dynamite class? 20:14:36 coolz 20:14:39 ok, any other questions on that? 20:14:42 mmcgrath: Sounds good, I should probably test our noc2 configs to make sure they work as well - I'd like to upgrade noc2 before we do noc1 too, just to make sure all is well on a not-so-important system 20:14:54 well 20:14:56 CodeBlock: a wise idea 20:14:57 tobyh, probably. I will look at what I have to do and let you know 20:15:03 and we can start doing ipv6 monitoring soon. Wooo! 20:15:08 not-so-important in the sense that it doesn't have all of our 160-something servers :P 20:16:29 speaking of monitoring... 20:16:34 smooge: another one for your list - https://admin.fedoraproject.org/collectd/bin/index.cgi?hostname=app5&plugin=apache×pan=2678400&action=show_selection&ok_button=OK 20:16:46 looks like collectd has been dead for about a week or so, take a peak if you get a minute 20:16:50 anyone have anything else on this subject? 20:17:01 what I thought I fixed that 20:17:03 damn it 20:17:10 * mmcgrath did too 20:17:17 it might just be shutdown for all I knwo :) 20:17:18 ok moving on. 20:17:22 #topic Infrastuructre -- FAS 20:17:24 * CodeBlock checks 20:17:29 Are processes monitored? 20:17:34 so abadger1999 and jds2001 aren't around 20:17:35 From nagios? 20:17:37 Ttech: important ones are. 20:17:41 ah 20:17:53 but we've got a fas update coming out soon. 20:18:14 This took something like 10 months since the conversion to the new SA kind of just stopped at some point and everyone was avoiding the codebase like the plague. 20:18:17 and collectd is? 20:18:28 * CodeBlock sidenotes that collectd is running on noc1 :/ 20:18:41 collectd is a data aggregator to rrd 20:18:44 wow, more than just s/Model.c.foo/Model.foo/ to convert to new SA? 20:19:07 lmacken: lots more, I think it has to do with identity providers among other things. 20:19:17 mmcgrath, I know. 20:19:19 ouch 20:19:24 I meant the monitoring part. 20:19:29 *monitored 20:19:46 oh, and a lot of weird things in auth.py had to change. Basically the whole "does this user have access to sponsor people in this group" changed. 20:19:55 and the big one, IIRC, was how we're doing privacy. 20:20:08 the privacy flag thing changed a lot between SA versions, anyway it's working now. 20:20:18 Ttech: not monitored 20:20:39 lmacken: side note, since I decided I haven't known true pain in a while.... Created a fas/tg2 branch 20:20:43 Alright 20:20:48 lets see how far we go with that. 20:21:08 lmacken: you done any actual 1.1 -> 2 conversions yet? 20:21:09 Well I'd offer to help with something, relating to monitoring, but it seems you got it well covered. :P 20:21:14 mmcgrath: haha, nice. I'm about to put all of my focus into the TG2 port of bodhi very soon 20:21:27 * lmacken has a new TG1 bodhi release in staging now... will probably hit production tonight or tomorrow 20:21:38 20:21:38 then I freeze the codebase, and haul through the rest of the TG2 port 20:21:52 Ttech: I suspect we'll need more help once 3.0 is up. We have a lot of service deps to get in place. 20:21:52 I've already ported the model & tests... controllers & templates are next 20:22:07 mmcgrath, Let me know. :/ 20:22:11 lmacken: how much changed in your model, related to identity when using tg2? 20:22:56 mmcgrath: hmm, well, we have FAS identity provider middleware for TG2, so I didn't have to do any auth model tweaks 20:22:58 or, since it all relies on FAS, was that not an issue? 20:23:00 yeah 20:23:19 famodel.py is still a bit advanced for me to fully walk though but I'm learning as I go. 20:23:28 anywho, anyone have any questions or concerns about FAS before we move on? 20:23:54 I'm hoping to have jds2001 do a full release sometime this week. We'll throw it in staging and deploy next week. 20:24:14 #topic ipv6 in telia 20:24:38 so Telia now has ipv6 capabilities. This is good as it'll add our 3rd public ipv6 reverse proxy 20:24:50 it'll also give ipv6 to noc2 so we can actually do proper monitoring of our ipv6 traffic. 20:25:25 also, as some of you noticed, ibiblio lost it's ipv6 connectivity at some point recently 20:25:29 they're aware and working on the problem. 20:25:31 no ETA yet. 20:25:35 any questions or comments there? 20:26:12 allrighty 20:26:21 #topic Open floor 20:26:30 Anyone have anything they'd like to discuss? 20:26:44 as I mentioned earlier... new bodhi in staging. should hit production tonight/tomorrow 20:26:57 lmacken: any major new features we can look forward to? 20:27:13 hm, mmcgrath FAS's JSON stuff has been tested/will work without many changes to the upcoming fh automation? 20:27:25 CodeBlock: should 20:27:27 lots of bugfixes, many more rss feeds (critpath, package-specific, user-specific), more docs, etc 20:27:28 * CodeBlock would like to get that usable soon :) 20:27:41 lmacken: cool 20:28:14 alrighty, if no one has anything else we'll close the meeting in 30 20:28:36 * CodeBlock thought he had a question but can't think of it, so bleh 20:28:48 k 20:28:49 #endmeeting