19:00:00 #startmeeting Infrastructure (2011-04-21) 19:00:00 Meeting started Thu Apr 21 19:00:00 2011 UTC. The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:00 Useful Commands: #action #agreed #halp #info #idea #link #topic. 19:00:00 #meetingname infrastructure 19:00:00 #topic Robot Roll Call 19:00:00 #chair goozbach smooge skvidal codeblock ricky nirik 19:00:00 The meeting name has been set to 'infrastructure' 19:00:00 Current chairs: codeblock goozbach nirik ricky skvidal smooge 19:00:11 wow right on :00 :D 19:00:12 who all is around for a lovely infrastructure meeting? 19:00:16 * skvidal is here 19:00:17 * CodeBlock is here 19:00:28 hare 19:00:55 here 19:01:06 * jbass29503 here am new 19:01:14 * skvidal is there 19:01:19 * skvidal is everywhere 19:01:28 * jsmith lurks 19:01:29 * skvidal hums 19:01:33 #topic new folks introduction (don't be shy!) 19:01:54 so, any new folks who are looking for things to do? or would like to say Hi? 19:02:08 "Hello! My name is Jared, and I'm a Fedoraholic..." 19:02:13 ok, I guess thats me 19:02:29 hello 19:02:31 hi fedora infrastucture team 19:02:33 Welcome jbass29503! 19:02:35 hellow 19:02:53 * StylusEater isn't really part of the team...used to help with a few packages and lurks a bit 19:03:04 welcome jbass29503 and StylusEater. :) 19:03:12 any areas you guys are interested in? 19:03:30 I think I mentioned a while back about helping with nagios. 19:03:37 nirik, in anything you need help in 19:03:55 cool. There's always nagios tweaking that needs doing... 19:04:02 any programming help needed ... I'd prefer to work on that... 19:04:21 StylusEater: well, much of our stuff is turbogears/python type stuff. 19:04:31 am not half bad at nagios, and see you have a nagios upgrade, would like to assit or at least be part of that to learn fop processes 19:04:41 nirik: not familiar with the turbogears framework but I've written python code. 19:04:57 StylusEater: if you would like to become an expert in fas - we could use additional eyes 19:05:11 nirik: I've used web.py and built some of my own templating stuff. 19:05:23 skvidal: account system? 19:05:28 yep 19:05:34 skvidal: kk 19:06:04 on the programming side, ricky / abadger2001 / lmacken would be the folks to talk with. You can look for tickets or look at code and see what you want to work on anytime of course. 19:06:10 grab a copy of the code from hosted - there are some todo list items that I know of that would be great to research some - like openid 2.0 providing 19:06:21 for nagios CodeBlock is going to be working on the migration... 19:06:23 skvidal: toshio is probably super overloaded... 19:06:37 StylusEater: which is exactly why more eyes is helpful 19:06:43 StylusEater: he's off on vacation right now, but always willing to help get someone up to speed. 19:06:46 skvidal: yup 19:07:24 otherwise, if you guys want to lurk in #fedora-admin and #fedora-noc, things come up all the time... please ask questions or offer to assist with things you are interested in. 19:07:40 nirik: also - unless I'm smoking dope 19:07:52 fi-apprentice works 19:07:53 as a group 19:08:09 so deciding to join that should let people LOGIN to systems w/o giving them much in the way of access 19:08:14 yeah, can we add a few more of us as sponsors/admins? I'm all for starting to use that. 19:08:41 sure 19:08:43 * skvidal does so 19:09:37 how do we want to handle it? add people in as we like, and remove after some timeout if they are no longer active? 19:09:39 * CodeBlock is fine with sponsoring some people, now that my semester is coming to an end 19:09:47 I'll have some time to work with people and such 19:10:15 excellent. 19:10:20 nirik sponsored my old packages 19:10:21 nirik: done 19:10:38 ok, so, welcome new folks... please hang around and ask questions or chime in. ;) 19:10:41 skvidal: thanks. 19:10:45 StylusEater: happy to. 19:10:50 I saw the apprentice option but my free time is ... err ... unpredictable 19:10:59 hi, sorry i'm late 19:11:34 cyberbyte: no worries. ;) 19:11:39 #topic Upcoming outages and work items 19:11:54 So, the next few weeks I have: 19:11:58 2011-04-21 at 20UTC: fas01 migration to new host. 19:11:58 2011-04-25 or so: puppet update on puppet1 19:11:58 2011-05-02 or so: fpca change (short fas outage) 19:11:58 2011-05-10 final freeze 19:12:15 if anyone would like to schedule other items in there, let me know... 19:12:33 we may have the pkgs01 branch changes from Oxf13 before final freeze sometime. 19:12:49 Anyone have other items there? or comments? 19:13:08 * ricky shows up for a bit, sorry 19:13:11 mmm, zodbot move to value01 -- at some point soonish. 19:13:21 I filed a ticket about that 19:13:52 Oxf13: yep. Would be good to do before final freeze? 19:14:06 CodeBlock: ok. I think most anytime with no meeting should work. 19:14:11 when is that? (and probably yes) 19:14:14 ricky! 19:14:32 Oxf13: 2011-05-10 19:14:47 skvidal: man, I wish people would get that excited when I show up to stuff. :P 19:15:01 CodeBlock: we can't miss you if you don't go away! :) 19:15:01 nirik: yeah, I'd hope to have it done by then 19:15:10 skvidal: I see fi-apprentice is invite only? 19:15:21 StylusEater: yes 19:15:22 Oxf13: ok, cool. We can see what fesco wants to do. It sounds like a pretty short outage. 19:15:24 on purpose - really 19:15:42 anyhow, moving along... 19:15:45 StylusEater: so we don't get a lot of people who are not REALLY interested signing up and making it hard to figure out who is who 19:15:47 skvidal: yes. I understand. "...and ask for assistance getting started" 19:15:51 StylusEater: right 19:15:58 #topic Post release housecleaning tasks 19:16:22 So, I'd like to look at having a set of tasks we do some weeks after every release. 19:16:42 I think it makes sense to tie them to our release cycle instead of 90days or whatever since then we don't run into freezes, etc. 19:16:46 https://fedoraproject.org/wiki/Infrastructure_post_release_housekeeping 19:16:53 is the page I whipped up on it. 19:17:19 Are there other tasks that would be good to add? Any comments on the ones there or more info on them? 19:18:24 Wonder if that's a good time to do somewhat regular rebuilds of certain machines so that they happen at a less invasive time 19:18:42 ricky: I'd be happy to see that 19:18:51 app## and proxy## would be nice to make more regular and easier 19:19:26 yeah, that might be good. 19:19:44 oh, and ping publictest people... "are you still using this?" 19:20:23 well - Ideally 19:20:33 I'd like to just do away with publictest as is entirely 19:20:35 speaking of that 19:20:41 if anyone wants to take up that task 19:20:56 of a very small system (to be run on puppet1, even) to track the publictest## boxes 19:21:01 and dispose of them at will 19:21:07 that would be great, imo 19:21:36 sorry for getting away from the convo 19:21:38 ok, we do have the wiki pages for them. 19:21:49 * nirik also purged some recently that were no longer being used. 19:21:51 nirik: right -0 but those grow stale and we could set real timeouts 19:21:58 agreed. 19:22:02 we talked about this before - but it didn't go anywhere 19:22:05 so a small tracking app... 19:22:24 "foo has been using publictest-3.14 for more than 6 months" 19:22:51 3.14, nice. ;D 19:23:13 * nirik updates the wiki page with a few more things. 19:23:23 so, feel free to edit/fill out or discuss anything on there. 19:23:36 I'd like to try it out say 3-4 weeks after f15 is released... 19:24:09 sounds reasonable 19:24:34 #topic Meeting tagged tickets 19:24:42 https://fedorahosted.org/fedora-infrastructure/query?status=new&status=assigned&status=reopened&group=milestone&keywords=~Meeting&order=priority 19:24:51 anyone have a meeting tagged ticket they wish to discuss? 19:25:45 Hmm, I got a "database is locked" error trying to view that link. 19:25:52 Went away on reload. 19:26:13 yeah, same here. I think too many people hit it at once. 19:26:20 Interesting that it'd error out instead of waiting on it for a bi 19:27:20 Didn't mean to interrupt, but is kind of an infrastructure issue. 19:27:21 anyhow, will move on if there's nothing else to call out from there. 19:27:32 tibbs: yeah, out trac version is ancient too. Can't help 19:27:34 so would be the best person to reach out to for the nagios one ? 19:27:59 !ticket 2275 19:28:04 * StylusEater fails 19:28:06 jbass29503: check with CodeBlock for the migration. I think we are doing that after f15 release... 19:28:12 .ticket 2275 19:28:13 nirik: #2275 (Upgrade Nagios) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2275 19:28:23 nirik: success! 19:28:25 correct, after f15 release 19:28:40 should go pretty smoothly 19:29:00 jbass29503: if you just want to poke at it, there's several other nagios tickets I think. Or we could get you able to look at the config and suggest additions... 19:29:09 If we wanted to set up a proxy path, say admin.fpo/nagios-test to go to noc03-tmp for some quick testing meanwhile, we could do that 19:29:51 * ricky debates a little whether nagios should be behind a proxy or not 19:30:03 yeah, although the noc03-tmp probibly doesn't have perms to talk to all the things it needs to monitor? 19:30:05 I remember hearing skvidal was looking at just poking holes in the firewall for it 19:30:32 so lemme make sure I grok this 19:30:34 I think for host checks that makes sense... 19:30:35 And might as well poke 80/443 or whatever as well and not have it depend on the proxies being up. Then again, it is just the web interface, so it's not really a big deal. 19:30:38 we want to have a peice of infrastructure 19:30:41 that can and DOES break 19:30:47 in front of the thing we use to determine what IS broken? 19:31:07 * skvidal gets egg 19:31:10 * skvidal gets chicken 19:31:11 well, just it's web interface 19:31:27 nirik: how do you determine what's down when you get a notice at 2am? 19:31:43 hm, well you could use an 'intermediate' host, instead of allowing the server running nagios, this maybe what ricky is speaking about? 19:32:11 skvidal: well, look at the notice and see what it says, then check/ping/login to it and see... 19:32:27 if it's up I often ssh to noc01 and run the check command myself to see what it's checking. 19:32:44 nirik: right - what I have found is when things go sideways - it tends to be multiple things going sideways 19:32:53 yeah. 19:33:00 so being able to see an overview.... 19:33:20 nagios is a horrible monitoring solution, it's just better than all the rest. ;) 19:33:52 okay - fine 19:33:52 I don't see any reason that we need to proxy it. No load balancing (it's one server), hardly any caching needed (sans stylesheets, which we could config apache to cache if it was a concern) 19:34:04 so - that's what I mean 19:34:16 right now our proxy keeps people from hitting an apache webserver and the cgi scripts 19:34:23 by hiding it behind a proxy + ssl + apache 19:34:25 so.... ummm 19:34:27 So what I was suggesting above was to just allow 80/443 to noc01 and have it be outside-facing. It shouldn't be too hard to do 19:34:29 aren't we masking apache from apache? 19:34:35 ricky: exactly 19:34:38 ricky: +1 to that 19:34:39 But I think we're in agreement here 19:34:41 * nirik is fine with that. 19:34:46 yeah 19:34:50 nagios.fedoraproject.org or something. 19:34:55 yeah 19:35:19 ok, any other meeting tagged tickets? 19:35:19 Or noc01 and noc02 even, to make things consistent 19:35:37 yeah... although it's nagios and nagios-external in puppet. 19:35:42 nirik ... we can't use monit i guess 19:36:16 monit is pretty limited I thought... 19:36:20 not sure I see any benefit in using a proxy for the nagios interface, unless you are using nagios in a disturbed fashion 19:36:24 I've seen a few people suggest monit, never looked into it though 19:36:42 jbass29503: I hope you meant distributed - but curiously enough 'disturbed' works too 19:36:43 we did try zabbix. 19:36:50 i use it at work 19:36:55 and that was a disaster 19:37:04 zabbix, that is 19:37:07 skvidal: sorry yes 19:37:13 nirik ... it can be ... but it's scriptable/customizable 19:37:15 yeah, failure. 19:37:49 * skvidal wonders if monit is born from the codebase of 'mon' 19:37:59 i believe it is 19:38:00 I used to love the txt-output status checks 19:38:02 ah 19:38:23 well, I think for the most part nagios works for us... we need to try and get it so that things are fixed and alerts are rare/only when there are real issues however. 19:38:29 mmonit.com/monit 19:38:39 StylusEater: nod 19:38:45 It's better than it was, but still noisy. 19:38:45 nirik: agreed 19:39:33 anyhow, patches or suggestions on improving our nagios setup welcome. 19:39:37 #topic Open Floor 19:39:42 * StylusEater thinks there really aren't any good foss monitoring tools ... all are noisy 19:39:42 anyone have anything for open floor? 19:39:46 even paid ones are noisy 19:40:22 Better of two evils - too noisy or too quiet :-) 19:40:30 Hi, sorry I was late for introductions, I was /trying/ to work with Toshio 19:40:40 yeah. Ideally it's a balance between: fix things so they are not ever seeing problems vs making the monitoring too lax and you never are alerted about problems that exist. 19:40:57 casep: welcome. ;) He's out on vacation right now... 19:40:58 casep: Ah, yeah he's on vacation now 19:41:06 but I think python/real developing is not best side 19:41:21 so I think I could help in other issues 19:41:23 i guess it's a fundamental network design problem really ... so striking a balance is really like catching water 19:42:19 zabbix was a replacement for something else that went crap up also in implementation I believe 19:42:29 I'm a fan of opennms but it's java 19:42:34 *shudder* :) 19:42:43 goozbach: and it wants to eat everything else you do 19:42:47 so for openfloor 19:42:59 if someone wants to go through puppet and look for any/everything using snmp 19:43:05 and figure out how we can put a bullet in it 19:43:07 casep: sounds good. Do hang out in #fedora-admin and #fedora-noc, look thru tickets and ask questions/offer to look at things. ;) 19:43:09 that would be a good thing 19:43:22 +1 to that :-) 19:43:25 nirik: done 19:43:35 nirik: question for the meetings... 19:43:42 goozbach: sure, shoot 19:43:58 you want I still do announcement/notes/minutes 19:44:05 or would you rather handle them? 19:44:20 * goozbach has been flaky in that regard lately 19:44:28 w/ the transistion of power :) 19:44:43 either way. Happy to do whatever works. If you would be interested in sending minutes and announcement I could run the meetings? 19:45:23 I liked having someone do that. 19:45:26 it really helped 19:45:33 thankyou goozbach 19:45:34 yeah. 19:45:55 ok so I'll handle announcement, agenda reminder, and minutes 19:46:00 and let you run the meeting 19:46:05 best of both worlds 19:46:17 this brings up another question 19:46:23 sounds good. ping me before sending and I can see if I need to add anything. 19:46:24 I've been posting our meeting notes after cleaning up the wiki section so I'll keep doing that and start reading the code base for fas. 19:46:33 StylusEater: sounds great. 19:46:33 is it possible to do out-of-band meeting notes with zodbot? 19:46:50 goozbach: not sure what you mean... 19:46:52 Not that I know of 19:46:57 instead of me taking notes by spamming the meeting channel 19:47:01 I assume he means keeping notes in /msg instead of in channel 19:47:09 ah, nope. 19:47:17 but thats a good suggestion for upstream. 19:47:18 #action goozbach to do meeting announcement and minutes 19:47:29 * StylusEater is glad we switched to git 19:47:50 ok, anything else? or shall we call it a meeting today? 19:48:04 * CodeBlock has nothing else; goes to see why app02 is evil. 19:48:40 * ricky goes off for a bit, see you later - will hopefully be around a little bit this weekend depending on how hectic things are :-/ 19:48:54 Thanks for coming everyone... continue discussion over in #fedora-admin. 19:48:56 #endmeeting