20:02:43 #startmeeting Fedora Infrastructure 2010-03-11 20:02:43 Meeting started Thu Mar 11 20:02:43 2010 UTC. The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:02:45 Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:02:51 thankyou zodbot 20:03:09 mmcgrath is finishing up a meeting so I am just helping along 20:03:26 #topic Roll Call 20:03:29 * ricky 20:03:32 smooge 20:03:36 * nirik is around. 20:03:36 * a-k is here 20:04:05 mdomsch said he was here a second ago before I started meeting 20:04:20 mmcgrath will be here soon. 20:04:37 * ayoung is here 20:04:40 ok let me pull up the other standard things we do in the meeting 20:05:37 #topic Meeting Tickets 20:06:39 According to trac.. there are no tickets for this meeting :) 20:06:52 am i missing anything people know? 20:07:06 ok moving on 20:07:09 #topic Alpha Release - https://fedorahosted.org/fedora-infrastructure/report/9 20:07:20 alpha release was on Tuesday 20:08:31 ok tickets 20:08:43 sorry for my slowness guys.. my greps aren't as fast as I hoped 20:08:49 .ticket 1944 20:08:52 smooge: #1944 (Fedora 13 Alpha Partial Infrastructure Freeze 16/Feb - 3/March) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/1944 20:08:58 # topic? 20:09:12 yo 20:09:14 * mmcgrath here :) 20:09:22 hi mmcgrath 20:09:27 herro 20:09:36 smooge: you want to keep going or you want me to take over? 20:09:43 hopefully I am helping here ... but its been real quiet. y 20:09:46 you can take over... 20:09:53 alrighty 20:09:54 you get people to say things :) 20:10:06 #chair mmcgrath 20:10:07 Current chairs: mmcgrath smooge 20:10:12 hehehe 20:10:18 so this alpha release went fine, but with oddities. 20:10:25 I'll go ahead and close 20:10:27 .ticket 1944 20:10:28 and 20:10:29 mmcgrath: #1944 (Fedora 13 Alpha Partial Infrastructure Freeze 16/Feb - 3/March) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/1944 20:10:30 .ticket 1990 20:10:33 mmcgrath: #1990 (Release Day Ticket) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/1990 20:10:40 lets talk about .ticket 1992 20:10:43 .ticket 1992 20:10:44 mmcgrath: #1992 (Lessons Learned) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/1992 20:11:04 So the first problem we actually ran into was that bapp1 had puppet disabled. 20:11:09 ricky: you around by chance? 20:11:11 Yup 20:11:29 so having puppet disabled on bapp1 did what exactly? 20:12:05 syncStatic is run by the apache user on bapp01 now, so that prevented it from getting the change to update the website 20:12:27 got'cha, so new syncStatic didn't make it on to the server so new website didn't make it to the proxy servers. 20:12:44 this is a monitoring thing, so when we get to monitoring which servers have puppet disabled, that one will go away 20:12:57 The other thign that happened was with our i2 netapp 20:13:09 Smooge and I just got off a meeting with the storage team about this. 20:13:19 basically it took several hours to transfer 16G worth of blocks. 20:13:35 the temporary fix was to put our sync traffic at a higher QoS then other traffic 20:13:44 long term though there was actually something wrong with the link... again. 20:13:53 they know and they're working on it 20:14:02 but it's certainly something we're going to want to track ourselves. 20:14:25 But, really the alpha went well even with those things... for non i-2 users anyway. 20:14:34 we had 85%+ good mirror rate 20:15:00 And the last thing I'd say is boot.fedoraproject.org 20:15:01 and once the I-2 got up I think they had little issues too 20:15:03 Oxf13: ping 20:15:08 smooge: indeed 20:15:57 well, right now I just did boot.fedoraproject.org 20:16:02 but it dawns on me that's sort of a releng task. 20:16:04 one that I'm happy to do 20:16:17 but I wanted to check with Oxf13 as to where he things the SOP should sit, in Infrastructure or RELENG. 20:16:30 it's a minor distinction in this case, but making one group responsible will ensure that it always gets done :) 20:16:52 So anyway 20:17:01 anyone have any other questions about the alpha release? 20:17:37 alrighty 20:17:38 well 20:17:41 #topic Next freeze 20:17:46 The next freeze is coming upretty quick 20:17:51 #link http://fedoraproject.org/wiki/Schedule 20:18:00 Infrastructure will be freezing on the 23rd. 20:18:03 that's less then 2 weeks. 20:18:14 so keep that in mind as you're deploying new things. 20:18:24 so we should go slushy 20:18:29 ? 20:18:37 slushy? 20:19:04 Ice, slushy, freeze? :-) 20:19:04 I have some MM fixes to get out, but may not make it before the next freeze :-( 20:19:20 mdomsch: anything I can do to help? 20:19:40 * mdomsch needs to test/fix one part, but it's been a month, so I forget which part it was ;-( 20:20:07 sorry I meant, if you are going to be testing stuff for over 2 weeks.. not the time to do it. 20:20:18 yeah 20:20:35 I'll try to send a couple of reminders as the time gets closer. 20:20:35 mmcgrath, nothing huge; I'll get to it, or not... 20:20:39 the freeze always sneak up on us :) 20:21:03 anyone have anything else on that? 20:21:05 we have updates to do this/next week 20:21:29 skvidal: ping 20:21:32 pong 20:21:33 smooge: lets talk about that 20:21:37 #topic Monthly Update 20:21:38 okie dokie 20:21:49 skvidal: ok, so the list I've been working from with you is very nearing completion. 20:22:02 skvidal: think we'll be in any place to update soon? 20:22:35 yah - I think likely 20:22:40 though not sure TOMORROW is gonna happen 20:22:54 thats fine tuesday sound ok? 20:23:07 sounds possible - depends what gets set on fire over the weekend 20:23:12 tuseday of next week sounds good, the freeze starts one week after that. 20:23:20 yeah which was why I didn't want monday :) 20:23:30 unless there are A LOT of yum bugs that show up in f13a over the weekend 20:23:34 then I should have the time 20:23:37 wghat are the changes you are working on? 20:24:22 smooge: I've been removing old hosts from puppet 20:24:35 skvidal's been working on having func use those hosts and coming up with a solid update script 20:24:38 or func program 20:24:40 smooge: you talking to me or mmcgrath? 20:24:41 I'm not sure what they're called. 20:24:45 okay 20:24:49 so here's all it ids 20:24:50 err is 20:25:07 1. make it so our func certs aren't constantly screwed up 20:25:08 ah 20:25:18 2. make it so our func minions match our puppet minions 20:25:42 3. have a script to let us do searches/installs/updates/etc from puppet1 w/o having to schlep all over the place 20:26:15 the script I worked on a couple of weeks ago 20:26:26 and got something mostly functional - but with room for improvement 20:26:37 smooge: do you know if this update will require a reboot? 20:26:47 the first 2 is what I spent the week working on to solve the problem that our func minions were completely wrong 20:26:57 and mangled badly from the phx2 move + rename stuff 20:27:20 well it depends on if a kernel update gets dropped over the weekend :). At the moment I don't think so 20:27:40 yeah so I've been most of my last afternoon and this morning renaming hosts. I think I've got 3 hosts left and a little additional cleanup to do 20:28:26 OK 20:28:28 no the worst will be an openssh restart 20:28:31 so anyone have anything else on this? 20:28:32 20:28:37 umm 20:28:40 I have a couple more things 20:28:50 skvidal: have at it 20:29:22 so I'll see if I can get a new func pkg out for all the hosts and a mechanism to update their minion.conf files to point to the puppet certificates 20:29:32 that's going to be the REALLY fun part :) 20:29:52 If you need any puppeting help let me know 20:30:02 you talking about a new func epel package or in the infra repo/ 20:30:03 is minion.conf on the boxes puppet controlled now? 20:30:07 infra repo 20:30:08 for now 20:30:12 it'll eventually make it over to epel 20:30:15 but this is new cod 20:30:19 I think so 20:30:23 e 20:30:47 and if it isn't I will be happy to schlep what needs to be done 20:31:04 anyway - that's really all 20:31:23 the func-yum overlord script is fairly simple and could be added to by anyone as we go along 20:31:34 that's what's going to do the update/list updates/etc work 20:31:58 I just wanted to mention that that is the greatest script name ever. 20:32:05 skvidal: and just becuase I'm overly paranoid... the time out bug? all fixed in the new version? 20:32:34 mmcgrath: the timeout bug was fixed long long ago 20:32:45 we've been running an old version of func for quite sometime 20:33:17 sorry im late 20:33:28 k 20:33:29 dgilmore: no worries 20:33:32 dgilmore: BETRAYER! 20:33:34 the only thing I wanted to ask was what would be prefered for making func 'groups/classes' 20:33:37 * skvidal giggles 20:33:43 skvidal, hey hey hey.. thats me 20:33:53 Ok, anyone have anything else on this? If not we'll move on. 20:33:54 dgilmore: sorry, I just thought it was hilarious! 20:34:03 skvidal: i had to get food for dinner and screwed up times 20:34:05 nothing form me 20:34:15 ok 20:34:19 #topic Search Engine 20:34:24 a-k: ping, want to take this? 20:34:29 Sure 20:34:46 I looking again at one of the candidates I eliminated from consideration earlier 20:34:55 It advertises the most complete support for Unicode char sets 20:35:05 I had eliminated it because I couldn't get it to work with SQLite 20:35:14 ... and its user forum had no answered questions about how to get SQLite to work 20:35:16 which one was this? 20:35:32 mnoGoSearch 20:35:38 pong 20:35:52 So I'm not there yet, but is there a db server I can use in pub test or do I need to install my own? 20:36:00 Either MySQL and PostgreSQL should work 20:36:11 a-k: we'll have one for when we move to staging and production, but on the pt servers, just yum and install one 20:36:41 OK. I'm still working on it locally, but I'll move to pt when I'm ready 20:36:46 cool 20:36:54 I think that's it for now 20:37:09 a-k: thanks 20:37:18 Anyone have any questions for a-k about that? 20:37:21 a-k: do we really care for sqlite support? 20:37:49 I thought SQLite would be easier/preferable to MySQL or PostgreSQL 20:38:03 easier for a demo but probably not for production 20:38:10 None of the other candidates had needed an external db 20:38:27 interesting 20:38:34 they all had their own local store then? 20:38:38 The other ones use their own local db 20:38:48 yeah 20:38:54 alright, anyone have anything else? 20:39:20 alrighty 20:39:23 #topic Monitoring 20:39:29 SO we talked about this on infrastructure for a bit. 20:39:34 it leaves us in an ackward position 20:39:38 do we just dump zabbix now? 20:39:40 mmcgrath, did you want to get Oxf13 while he was here. 20:39:42 go back to nagios for a bit? 20:39:43 mmcgrath: yes 20:39:45 smooge: oh right 20:40:11 Oxf13: did you want me to put the boot.fedoraproject.org as a releng SOP or an infrastructure SOP? it seems more releng, I'm happy to do it for as long as I'm part of Fedora but it should be documented :) 20:40:30 that's a good question. 20:40:46 it does sound relengy 20:41:16 it's pretty easy to maintain, I just want to make sure it actually gets done every release. 20:41:37 Oxf13: you're call. I'll write it up this week sometime, just let me know :) 20:41:53 * dgilmore thinks is a releng thing 20:42:16 * mdomsch says releng ;-) 20:42:33 alrighty, well unless Oxf13 says otherwise I'll put it as a marketing sop :-P 20:42:45 naw, I'll just put it in releng for now and if we change our minds later we can move it. 20:42:55 ok, back to monitoring. 20:42:57 mmcgrath, unless you write a script to run when releng pushes a tree, to update bko automatically. then it's an infra ticket that gets opened and closed automagically 20:42:58 I think zabbix has been taking away effort on improving our nagios monitoring, so I'm up for dumping 20:43:09 releng works for us, and if you come with content that's even better (: 20:43:34 ricky: so the problem then is trending. 20:43:38 our cacti install 20:43:50 doesn't even seem to exist anymore :) 20:44:15 Apart from request times for our websites, what do you want to be able to monitor that cacti can't? 20:44:18 can I work on tht with someone? 20:44:20 Well, easily can't 20:44:29 ricky: well, the problem is that we have to enter data in two locations 20:44:33 and that has inherit problems. 20:44:56 and doing custom trending in cacti can be tricky at times. 20:45:25 What kind of problems? You still get your notification when something goes down and the general idea of movements, right? 20:45:26 anyway, lets take a look at our current zabbix install 20:45:28 err nagios install 20:45:30 and see how it goes. 20:45:44 ricky: ehh, I use trending a lot, to see when things started. 20:45:54 the alert is when a threshold started, but it's usually only a small part of the story. 20:45:56 I've also not been crazy about running the zabbix agent public facing either. 20:46:11 for example when MM had the bloated pickle issues. 20:46:21 we didn't get the alert until days after MM was upgraded. 20:46:30 without trending we wouldn't have noticed when the problems started 20:46:38 Yeah, but cacti would still have given you the big picture you needed, right? 20:46:44 ricky: i rember you disabled catci 20:46:44 if it were in there. 20:46:55 trying to keep cacti and nagios in sync is goign to be a pretty big pain. 20:47:11 especially when we start wanting to monitor artibrary bits of info with cacti 20:47:16 ricky, that is my major problem with it also 20:47:19 it gets complex pretty quick 20:47:31 I think we should have one agent per server.. and thats func 20:47:38 dgilmore: Probably, I don't running public facing stuff we're not using :-) 20:47:58 ricky: from memory there was a security bug in cacti 20:48:01 ricky: FWIW, I still use zabbix, it's sitting on my desktop now and beeps at me :) 20:48:05 and rather than fix it you disabled it 20:48:12 but i could be remebering wrong 20:48:30 that could be 20:48:40 Sounds like me alright 20:48:53 well no rush to make a decision today. nagios alerts are still going out and paged alerts are going out 20:49:04 and at the moment, zabbix does have trending of some of the important things we need 20:49:07 like /mnt/koji usage 20:49:12 The question is whether we should be spending time on improving nagios monitoring 20:49:25 Like adding checks and keeping it in sync with hosts. That stuff has mostly stagnated since we looked at zabbix 20:49:33 yeah 20:49:37 I'm not sure how out of sync they are. 20:49:42 and zabbix has a lot of stuff nagios doesn't monitor. 20:49:47 but that nagios doesn't really need to 20:49:47 Is that a yeah to "we should be spending time on it" ? :-) 20:50:04 just yeah, that stuff has stagnated 20:50:16 I'll take a look at things and try to get a better estimate of the work that needs to be done. 20:50:58 ricky: i know we have hosts not in nagios 20:51:46 yeah 20:51:56 but most of our exernal service are still properly montiored by it 20:52:02 properly(ish) 20:52:06 ok, so more work to do, I'll get on that. 20:52:13 anyone have anything else? If not I'll open the floor 20:52:32 all I want in a monitoring solution is something where adding a host to be monitorer can be done via puppet +files 20:53:13 yeah 20:53:13 ok 20:53:16 Same here - the question seems to be what we want for a trending solution :-/ 20:53:18 I hate clicking on things or depending on having to click on things. But open floor 20:53:18 #topic open floor 20:53:47 ricky: I would love it if something like sar had the ability for arbitrary input :) 20:54:02 * gholms|work raises hand 20:54:06 gholms|work: yo 20:54:07 Do you folks still plan on moving to Zenoss once 3.0 is released? 20:54:21 gholms|work: I don't think we ever planned on moving to Zenoss 20:54:28 so not to still, and no to zenoss 3.0 :) 20:54:37 Heh, ok. 20:54:42 * gholms|work wonders where that idea came from... 20:54:58 I thought zenoss wasn't totally free? 20:55:02 I know it's not in Fedora yet, and that's a requisite 20:55:27 It has a FOSS version. The big hangup was that it relied on bundled Python 2.4. 20:55:30 Ok, anyone have anything else to discuss? 20:56:22 Just want to get this some visibility 20:56:39 Jason Walsh has been working on poseidon, an OpenID provider writtin in pylons to replace the broken stuff in FAS 20:56:59 So I'll be trying to get it setup with FAS auth on a publictest soon 20:57:16 ricky: excellent 20:57:22 happy to hear that one's been making progress. 20:57:31 Which will hopefully be a good workout for the auth middleware in python-fedora :-) 20:57:31 ricky: are we the only users / potential users? 20:57:54 Sorry, not sure what you mean 20:58:02 does anyone else use poseidon? 20:58:22 Oh, no - it was started and written for this 20:58:26 k 20:58:39 ricky: do you think we could have a way to use openid, present the cla and loow wiki edits etc 20:58:56 and things like bodhi we could use openid sans cla 20:59:03 for feedback 20:59:33 Hm, that could get messy, as openids not linked to an existing FAS account could create all sorts of corner cases 20:59:35 or am i all sorts of crazy 20:59:38 dgilmore: I think this is more of a provider 20:59:48 to accept openid from places would require a different sort of work 20:59:58 mmcgrath: ok 20:59:59 This is more of a "packagers can login to upstream bugzillas, etc. with their OpenID" sort of thing. 21:00:13 (If those exist - or even just commenting on blogs) 21:00:21 cool 21:00:23 * abadger1999 shows up in time for open mic^Wfloor 21:00:32 just throwing my crazy ideas out there 21:00:40 yup yup 21:00:47 time to close the meeting, if no one has any objections we'll close in 30 21:01:01 * dgilmore wants to hear abadger1999 21:01:08 ... sing :-) 21:01:16 dgilmore: Interpretive dance :-) 21:01:24 Over IRC?!? 21:01:26 abadger1999: seen it already 21:01:26 we can talk about this next time or in #fedora-admin if we want, lets not hold everyone here :) 21:01:30 mmcgrath: close it up 21:01:31 #endmeeting