20:02:43 <smooge> #startmeeting Fedora Infrastructure 2010-03-11 20:02:43 <zodbot> Meeting started Thu Mar 11 20:02:43 2010 UTC. The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:02:45 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:02:51 <smooge> thankyou zodbot 20:03:09 <smooge> mmcgrath is finishing up a meeting so I am just helping along 20:03:26 <smooge> #topic Roll Call 20:03:29 * ricky 20:03:32 <smooge> smooge 20:03:36 * nirik is around. 20:03:36 * a-k is here 20:04:05 <smooge> mdomsch said he was here a second ago before I started meeting 20:04:20 <smooge> mmcgrath will be here soon. 20:04:37 * ayoung is here 20:04:40 <smooge> ok let me pull up the other standard things we do in the meeting 20:05:37 <smooge> #topic Meeting Tickets 20:06:39 <smooge> According to trac.. there are no tickets for this meeting :) 20:06:52 <smooge> am i missing anything people know? 20:07:06 <smooge> ok moving on 20:07:09 <smooge> #topic Alpha Release - https://fedorahosted.org/fedora-infrastructure/report/9 20:07:20 <smooge> alpha release was on Tuesday 20:08:31 <smooge> ok tickets 20:08:43 <smooge> sorry for my slowness guys.. my greps aren't as fast as I hoped 20:08:49 <smooge> .ticket 1944 20:08:52 <zodbot> smooge: #1944 (Fedora 13 Alpha Partial Infrastructure Freeze 16/Feb - 3/March) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/1944 20:08:58 <a-k> # topic? 20:09:12 <mmcgrath> yo 20:09:14 * mmcgrath here :) 20:09:22 <smooge> hi mmcgrath 20:09:27 <mmcgrath> herro 20:09:36 <mmcgrath> smooge: you want to keep going or you want me to take over? 20:09:43 <smooge> hopefully I am helping here ... but its been real quiet. y 20:09:46 <smooge> you can take over... 20:09:53 <mmcgrath> alrighty 20:09:54 <smooge> you get people to say things :) 20:10:06 <smooge> #chair mmcgrath 20:10:07 <zodbot> Current chairs: mmcgrath smooge 20:10:12 <mmcgrath> hehehe 20:10:18 <mmcgrath> so this alpha release went fine, but with oddities. 20:10:25 <mmcgrath> I'll go ahead and close 20:10:27 <mmcgrath> .ticket 1944 20:10:28 <mmcgrath> and 20:10:29 <zodbot> mmcgrath: #1944 (Fedora 13 Alpha Partial Infrastructure Freeze 16/Feb - 3/March) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/1944 20:10:30 <mmcgrath> .ticket 1990 20:10:33 <zodbot> mmcgrath: #1990 (Release Day Ticket) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/1990 20:10:40 <mmcgrath> lets talk about .ticket 1992 20:10:43 <mmcgrath> .ticket 1992 20:10:44 <zodbot> mmcgrath: #1992 (Lessons Learned) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/1992 20:11:04 <mmcgrath> So the first problem we actually ran into was that bapp1 had puppet disabled. 20:11:09 <mmcgrath> ricky: you around by chance? 20:11:11 <ricky> Yup 20:11:29 <mmcgrath> so having puppet disabled on bapp1 did what exactly? 20:12:05 <ricky> syncStatic is run by the apache user on bapp01 now, so that prevented it from getting the change to update the website 20:12:27 <mmcgrath> got'cha, so new syncStatic didn't make it on to the server so new website didn't make it to the proxy servers. 20:12:44 <mmcgrath> this is a monitoring thing, so when we get to monitoring which servers have puppet disabled, that one will go away 20:12:57 <mmcgrath> The other thign that happened was with our i2 netapp 20:13:09 <mmcgrath> Smooge and I just got off a meeting with the storage team about this. 20:13:19 <mmcgrath> basically it took several hours to transfer 16G worth of blocks. 20:13:35 <mmcgrath> the temporary fix was to put our sync traffic at a higher QoS then other traffic 20:13:44 <mmcgrath> long term though there was actually something wrong with the link... again. 20:13:53 <mmcgrath> they know and they're working on it 20:14:02 <mmcgrath> but it's certainly something we're going to want to track ourselves. 20:14:25 <mmcgrath> But, really the alpha went well even with those things... for non i-2 users anyway. 20:14:34 <mmcgrath> we had 85%+ good mirror rate 20:15:00 <mmcgrath> And the last thing I'd say is boot.fedoraproject.org 20:15:01 <smooge> and once the I-2 got up I think they had little issues too 20:15:03 <mmcgrath> Oxf13: ping 20:15:08 <mmcgrath> smooge: indeed 20:15:57 <mmcgrath> well, right now I just did boot.fedoraproject.org 20:16:02 <mmcgrath> but it dawns on me that's sort of a releng task. 20:16:04 <mmcgrath> one that I'm happy to do 20:16:17 <mmcgrath> but I wanted to check with Oxf13 as to where he things the SOP should sit, in Infrastructure or RELENG. 20:16:30 <mmcgrath> it's a minor distinction in this case, but making one group responsible will ensure that it always gets done :) 20:16:52 <mmcgrath> So anyway 20:17:01 <mmcgrath> anyone have any other questions about the alpha release? 20:17:37 <mmcgrath> alrighty 20:17:38 <mmcgrath> well 20:17:41 <mmcgrath> #topic Next freeze 20:17:46 <mmcgrath> The next freeze is coming upretty quick 20:17:51 <mmcgrath> #link http://fedoraproject.org/wiki/Schedule 20:18:00 <mmcgrath> Infrastructure will be freezing on the 23rd. 20:18:03 <mmcgrath> that's less then 2 weeks. 20:18:14 <mmcgrath> so keep that in mind as you're deploying new things. 20:18:24 <smooge> so we should go slushy 20:18:29 <smooge> ? 20:18:37 <mmcgrath> slushy? 20:19:04 <ricky> Ice, slushy, freeze? :-) 20:19:04 <mdomsch> I have some MM fixes to get out, but may not make it before the next freeze :-( 20:19:20 <mmcgrath> mdomsch: anything I can do to help? 20:19:40 * mdomsch needs to test/fix one part, but it's been a month, so I forget which part it was ;-( 20:20:07 <smooge> sorry I meant, if you are going to be testing stuff for over 2 weeks.. not the time to do it. 20:20:18 <mmcgrath> yeah 20:20:35 <mmcgrath> I'll try to send a couple of reminders as the time gets closer. 20:20:35 <mdomsch> mmcgrath, nothing huge; I'll get to it, or not... 20:20:39 <mmcgrath> the freeze always sneak up on us :) 20:21:03 <mmcgrath> anyone have anything else on that? 20:21:05 <smooge> we have updates to do this/next week 20:21:29 <mmcgrath> skvidal: ping 20:21:32 <skvidal> pong 20:21:33 <mmcgrath> smooge: lets talk about that 20:21:37 <mmcgrath> #topic Monthly Update 20:21:38 <smooge> okie dokie 20:21:49 <mmcgrath> skvidal: ok, so the list I've been working from with you is very nearing completion. 20:22:02 <mmcgrath> skvidal: think we'll be in any place to update soon? 20:22:35 <skvidal> yah - I think likely 20:22:40 <skvidal> though not sure TOMORROW is gonna happen 20:22:54 <smooge> thats fine tuesday sound ok? 20:23:07 <skvidal> sounds possible - depends what gets set on fire over the weekend 20:23:12 <mmcgrath> tuseday of next week sounds good, the freeze starts one week after that. 20:23:20 <smooge> yeah which was why I didn't want monday :) 20:23:30 <skvidal> unless there are A LOT of yum bugs that show up in f13a over the weekend 20:23:34 <skvidal> then I should have the time 20:23:37 <smooge> wghat are the changes you are working on? 20:24:22 <mmcgrath> smooge: I've been removing old hosts from puppet 20:24:35 <mmcgrath> skvidal's been working on having func use those hosts and coming up with a solid update script 20:24:38 <mmcgrath> or func program 20:24:40 <skvidal> smooge: you talking to me or mmcgrath? 20:24:41 <mmcgrath> I'm not sure what they're called. 20:24:45 <skvidal> okay 20:24:49 <skvidal> so here's all it ids 20:24:50 <skvidal> err is 20:25:07 <skvidal> 1. make it so our func certs aren't constantly screwed up 20:25:08 <smooge> ah 20:25:18 <skvidal> 2. make it so our func minions match our puppet minions 20:25:42 <skvidal> 3. have a script to let us do searches/installs/updates/etc from puppet1 w/o having to schlep all over the place 20:26:15 <skvidal> the script I worked on a couple of weeks ago 20:26:26 <skvidal> and got something mostly functional - but with room for improvement 20:26:37 <mmcgrath> smooge: do you know if this update will require a reboot? 20:26:47 <skvidal> the first 2 is what I spent the week working on to solve the problem that our func minions were completely wrong 20:26:57 <skvidal> and mangled badly from the phx2 move + rename stuff 20:27:20 <smooge> well it depends on if a kernel update gets dropped over the weekend :). At the moment I don't think so 20:27:40 <mmcgrath> yeah so I've been most of my last afternoon and this morning renaming hosts. I think I've got 3 hosts left and a little additional cleanup to do 20:28:26 <mmcgrath> OK 20:28:28 <smooge> no the worst will be an openssh restart 20:28:31 <mmcgrath> so anyone have anything else on this? 20:28:32 <mmcgrath> <nod> 20:28:37 <skvidal> umm 20:28:40 <skvidal> I have a couple more things 20:28:50 <mmcgrath> skvidal: have at it 20:29:22 <skvidal> so I'll see if I can get a new func pkg out for all the hosts and a mechanism to update their minion.conf files to point to the puppet certificates 20:29:32 <skvidal> that's going to be the REALLY fun part :) 20:29:52 <mmcgrath> If you need any puppeting help let me know 20:30:02 <mmcgrath> you talking about a new func epel package or in the infra repo/ 20:30:03 <skvidal> is minion.conf on the boxes puppet controlled now? 20:30:07 <skvidal> infra repo 20:30:08 <skvidal> for now 20:30:12 <skvidal> it'll eventually make it over to epel 20:30:15 <skvidal> but this is new cod 20:30:19 <ricky> I think so 20:30:23 <skvidal> e 20:30:47 <smooge> and if it isn't I will be happy to schlep what needs to be done 20:31:04 <skvidal> anyway - that's really all 20:31:23 <skvidal> the func-yum overlord script is fairly simple and could be added to by anyone as we go along 20:31:34 <skvidal> that's what's going to do the update/list updates/etc work 20:31:58 <gholms|work> I just wanted to mention that that is the greatest script name ever. 20:32:05 <mmcgrath> skvidal: and just becuase I'm overly paranoid... the time out bug? all fixed in the new version? 20:32:34 <skvidal> mmcgrath: the timeout bug was fixed long long ago 20:32:45 <skvidal> we've been running an old version of func for quite sometime 20:33:17 <dgilmore> sorry im late 20:33:28 <mmcgrath> k 20:33:29 <mmcgrath> dgilmore: no worries 20:33:32 <skvidal> dgilmore: BETRAYER! 20:33:34 <smooge> the only thing I wanted to ask was what would be prefered for making func 'groups/classes' 20:33:37 * skvidal giggles 20:33:43 <smooge> skvidal, hey hey hey.. thats me 20:33:53 <mmcgrath> Ok, anyone have anything else on this? If not we'll move on. 20:33:54 <skvidal> dgilmore: sorry, I just thought it was hilarious! 20:34:03 <dgilmore> skvidal: i had to get food for dinner and screwed up times 20:34:05 <smooge> nothing form me 20:34:15 <mmcgrath> ok 20:34:19 <mmcgrath> #topic Search Engine 20:34:24 <mmcgrath> a-k: ping, want to take this? 20:34:29 <a-k> Sure 20:34:46 <a-k> I looking again at one of the candidates I eliminated from consideration earlier 20:34:55 <a-k> It advertises the most complete support for Unicode char sets 20:35:05 <a-k> I had eliminated it because I couldn't get it to work with SQLite 20:35:14 <a-k> ... and its user forum had no answered questions about how to get SQLite to work 20:35:16 <mmcgrath> which one was this? 20:35:32 <a-k> mnoGoSearch 20:35:38 <Oxf13> pong 20:35:52 <a-k> So I'm not there yet, but is there a db server I can use in pub test or do I need to install my own? 20:36:00 <a-k> Either MySQL and PostgreSQL should work 20:36:11 <mmcgrath> a-k: we'll have one for when we move to staging and production, but on the pt servers, just yum and install one 20:36:41 <a-k> OK. I'm still working on it locally, but I'll move to pt when I'm ready 20:36:46 <mmcgrath> cool 20:36:54 <a-k> I think that's it for now 20:37:09 <mmcgrath> a-k: thanks 20:37:18 <mmcgrath> Anyone have any questions for a-k about that? 20:37:21 <dgilmore> a-k: do we really care for sqlite support? 20:37:49 <a-k> I thought SQLite would be easier/preferable to MySQL or PostgreSQL 20:38:03 <mmcgrath> easier for a demo but probably not for production 20:38:10 <a-k> None of the other candidates had needed an external db 20:38:27 <mmcgrath> interesting 20:38:34 <mmcgrath> they all had their own local store then? 20:38:38 <a-k> The other ones use their own local db 20:38:48 <mmcgrath> yeah 20:38:54 <mmcgrath> alright, anyone have anything else? 20:39:20 <mmcgrath> alrighty 20:39:23 <mmcgrath> #topic Monitoring 20:39:29 <mmcgrath> SO we talked about this on infrastructure for a bit. 20:39:34 <mmcgrath> it leaves us in an ackward position 20:39:38 <mmcgrath> do we just dump zabbix now? 20:39:40 <smooge> mmcgrath, did you want to get Oxf13 while he was here. 20:39:42 <mmcgrath> go back to nagios for a bit? 20:39:43 <dgilmore> mmcgrath: yes 20:39:45 <mmcgrath> smooge: oh right 20:40:11 <mmcgrath> Oxf13: did you want me to put the boot.fedoraproject.org as a releng SOP or an infrastructure SOP? it seems more releng, I'm happy to do it for as long as I'm part of Fedora but it should be documented :) 20:40:30 <Oxf13> that's a good question. 20:40:46 <Oxf13> it does sound relengy 20:41:16 <mmcgrath> it's pretty easy to maintain, I just want to make sure it actually gets done every release. 20:41:37 <mmcgrath> Oxf13: you're call. I'll write it up this week sometime, just let me know :) 20:41:53 * dgilmore thinks is a releng thing 20:42:16 * mdomsch says releng ;-) 20:42:33 <mmcgrath> alrighty, well unless Oxf13 says otherwise I'll put it as a marketing sop :-P 20:42:45 <mmcgrath> naw, I'll just put it in releng for now and if we change our minds later we can move it. 20:42:55 <mmcgrath> ok, back to monitoring. 20:42:57 <mdomsch> mmcgrath, unless you write a script to run when releng pushes a tree, to update bko automatically. then it's an infra ticket that gets opened and closed automagically 20:42:58 <ricky> I think zabbix has been taking away effort on improving our nagios monitoring, so I'm up for dumping 20:43:09 <Oxf13> releng works for us, and if you come with content that's even better (: 20:43:34 <mmcgrath> ricky: so the problem then is trending. 20:43:38 <mmcgrath> our cacti install 20:43:50 <mmcgrath> doesn't even seem to exist anymore :) 20:44:15 <ricky> Apart from request times for our websites, what do you want to be able to monitor that cacti can't? 20:44:18 <smooge> can I work on tht with someone? 20:44:20 <ricky> Well, easily can't 20:44:29 <mmcgrath> ricky: well, the problem is that we have to enter data in two locations 20:44:33 <mmcgrath> and that has inherit problems. 20:44:56 <mmcgrath> and doing custom trending in cacti can be tricky at times. 20:45:25 <ricky> What kind of problems? You still get your notification when something goes down and the general idea of movements, right? 20:45:26 <mmcgrath> anyway, lets take a look at our current zabbix install 20:45:28 <mmcgrath> err nagios install 20:45:30 <mmcgrath> and see how it goes. 20:45:44 <mmcgrath> ricky: ehh, I use trending a lot, to see when things started. 20:45:54 <mmcgrath> the alert is when a threshold started, but it's usually only a small part of the story. 20:45:56 <ricky> I've also not been crazy about running the zabbix agent public facing either. 20:46:11 <mmcgrath> for example when MM had the bloated pickle issues. 20:46:21 <mmcgrath> we didn't get the alert until days after MM was upgraded. 20:46:30 <mmcgrath> without trending we wouldn't have noticed when the problems started 20:46:38 <ricky> Yeah, but cacti would still have given you the big picture you needed, right? 20:46:44 <dgilmore> ricky: i rember you disabled catci 20:46:44 <mmcgrath> if it were in there. 20:46:55 <mmcgrath> trying to keep cacti and nagios in sync is goign to be a pretty big pain. 20:47:11 <mmcgrath> especially when we start wanting to monitor artibrary bits of info with cacti 20:47:16 <smooge> ricky, that is my major problem with it also 20:47:19 <mmcgrath> it gets complex pretty quick 20:47:31 <smooge> I think we should have one agent per server.. and thats func 20:47:38 <ricky> dgilmore: Probably, I don't running public facing stuff we're not using :-) 20:47:58 <dgilmore> ricky: from memory there was a security bug in cacti 20:48:01 <mmcgrath> ricky: FWIW, I still use zabbix, it's sitting on my desktop now and beeps at me :) 20:48:05 <dgilmore> and rather than fix it you disabled it 20:48:12 <dgilmore> but i could be remebering wrong 20:48:30 <mmcgrath> that could be 20:48:40 <ricky> Sounds like me alright 20:48:53 <mmcgrath> well no rush to make a decision today. nagios alerts are still going out and paged alerts are going out 20:49:04 <mmcgrath> and at the moment, zabbix does have trending of some of the important things we need 20:49:07 <mmcgrath> like /mnt/koji usage 20:49:12 <ricky> The question is whether we should be spending time on improving nagios monitoring 20:49:25 <ricky> Like adding checks and keeping it in sync with hosts. That stuff has mostly stagnated since we looked at zabbix 20:49:33 <mmcgrath> yeah 20:49:37 <mmcgrath> I'm not sure how out of sync they are. 20:49:42 <mmcgrath> and zabbix has a lot of stuff nagios doesn't monitor. 20:49:47 <mmcgrath> but that nagios doesn't really need to 20:49:47 <ricky> Is that a yeah to "we should be spending time on it" ? :-) 20:50:04 <mmcgrath> just yeah, that stuff has stagnated 20:50:16 <mmcgrath> I'll take a look at things and try to get a better estimate of the work that needs to be done. 20:50:58 <dgilmore> ricky: i know we have hosts not in nagios 20:51:46 <mmcgrath> yeah 20:51:56 <mmcgrath> but most of our exernal service are still properly montiored by it 20:52:02 <mmcgrath> properly(ish) 20:52:06 <mmcgrath> ok, so more work to do, I'll get on that. 20:52:13 <mmcgrath> anyone have anything else? If not I'll open the floor 20:52:32 <smooge> all I want in a monitoring solution is something where adding a host to be monitorer can be done via puppet +files 20:53:13 <mmcgrath> yeah 20:53:13 <mmcgrath> ok 20:53:16 <ricky> Same here - the question seems to be what we want for a trending solution :-/ 20:53:18 <smooge> I hate clicking on things or depending on having to click on things. But open floor 20:53:18 <mmcgrath> #topic open floor 20:53:47 <mmcgrath> ricky: I would love it if something like sar had the ability for arbitrary input :) 20:54:02 * gholms|work raises hand 20:54:06 <mmcgrath> gholms|work: yo 20:54:07 <gholms|work> Do you folks still plan on moving to Zenoss once 3.0 is released? 20:54:21 <mmcgrath> gholms|work: I don't think we ever planned on moving to Zenoss 20:54:28 <mmcgrath> so not to still, and no to zenoss 3.0 :) 20:54:37 <gholms|work> Heh, ok. 20:54:42 * gholms|work wonders where that idea came from... 20:54:58 <mmcgrath> I thought zenoss wasn't totally free? 20:55:02 <mmcgrath> I know it's not in Fedora yet, and that's a requisite 20:55:27 <gholms|work> It has a FOSS version. The big hangup was that it relied on bundled Python 2.4. 20:55:30 <mmcgrath> Ok, anyone have anything else to discuss? 20:56:22 <ricky> Just want to get this some visibility 20:56:39 <ricky> Jason Walsh has been working on poseidon, an OpenID provider writtin in pylons to replace the broken stuff in FAS 20:56:59 <ricky> So I'll be trying to get it setup with FAS auth on a publictest soon 20:57:16 <mmcgrath> ricky: excellent 20:57:22 <mmcgrath> happy to hear that one's been making progress. 20:57:31 <ricky> Which will hopefully be a good workout for the auth middleware in python-fedora :-) 20:57:31 <mmcgrath> ricky: are we the only users / potential users? 20:57:54 <ricky> Sorry, not sure what you mean 20:58:02 <mmcgrath> does anyone else use poseidon? 20:58:22 <ricky> Oh, no - it was started and written for this 20:58:26 <mmcgrath> k 20:58:39 <dgilmore> ricky: do you think we could have a way to use openid, present the cla and loow wiki edits etc 20:58:56 <dgilmore> and things like bodhi we could use openid sans cla 20:59:03 <dgilmore> for feedback 20:59:33 <ricky> Hm, that could get messy, as openids not linked to an existing FAS account could create all sorts of corner cases 20:59:35 <dgilmore> or am i all sorts of crazy 20:59:38 <mmcgrath> dgilmore: I think this is more of a provider 20:59:48 <mmcgrath> to accept openid from places would require a different sort of work 20:59:58 <dgilmore> mmcgrath: ok 20:59:59 <ricky> This is more of a "packagers can login to upstream bugzillas, etc. with their OpenID" sort of thing. 21:00:13 <ricky> (If those exist - or even just commenting on blogs) 21:00:21 <dgilmore> cool 21:00:23 * abadger1999 shows up in time for open mic^Wfloor 21:00:32 <dgilmore> just throwing my crazy ideas out there 21:00:40 <mmcgrath> yup yup 21:00:47 <mmcgrath> time to close the meeting, if no one has any objections we'll close in 30 21:01:01 * dgilmore wants to hear abadger1999 21:01:08 <ricky> ... sing :-) 21:01:16 <abadger1999> dgilmore: Interpretive dance :-) 21:01:24 <gholms|work> Over IRC?!? 21:01:26 <dgilmore> abadger1999: seen it already 21:01:26 <mmcgrath> we can talk about this next time or in #fedora-admin if we want, lets not hold everyone here :) 21:01:30 <dgilmore> mmcgrath: close it up 21:01:31 <mmcgrath> #endmeeting