20:00:39 <smooge> #startmeeting infrastructure
20:00:39 <zodbot> Meeting started Thu Dec 16 20:00:39 2010 UTC.  The chair is smooge. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:39 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
20:00:51 <smooge> #meetingname Infrastructure
20:00:51 <zodbot> The meeting name has been set to 'infrastructure'
20:01:04 <smooge> #chair skvidal dgilmore
20:01:04 <zodbot> Current chairs: dgilmore skvidal smooge
20:01:11 <skvidal> yah
20:01:12 * skvidal is here
20:01:13 <smooge> #topic roll call
20:01:13 * ricky 
20:01:17 * sgallagh lurks
20:01:18 * CodeBlock here
20:01:19 * ricky 
20:01:21 * ke4qqq 
20:01:22 * waltJ skulks around
20:01:23 * goozbach here
20:01:28 * dgilmore is preset and accounted for
20:01:29 * rfelsburg is here
20:01:34 * jsmith lurks
20:01:41 * rbergeron is impressed!
20:01:51 * sijis is around
20:01:51 * deathwing01 is here!
20:02:11 * nirik is around
20:02:15 <dgilmore> rbergeron: how so?
20:02:22 * skvidal is here
20:02:42 <gholms> Lots of people here today!
20:02:44 <rbergeron> that was a very rapid bullet-pointy list of people all being present and accounted for :)
20:03:00 <jsmith> rbergeron: Sysadmins are efficient!
20:03:01 <dgilmore> rbergeron: i see
20:03:05 <sgallagh> rbergeron: except dgilmore, who is preset, not present :)
20:03:13 <smooge> #topic introductions
20:03:18 <dgilmore> rbergeron: we all bots and scripts
20:03:21 * gholms resets dgilmore
20:04:06 * Elwell_ is lurking as normal
20:04:37 <smooge> real quick I would like to introduce our new volunteers who have been helping goozbach has helped schedules, deathwing01 is working on trac testing for EL6 and hvivani has been helping on smokeping
20:05:20 <smooge> rfelsburg, and some others have been helpful also.
20:05:27 <smooge> but I have not had much time to mentor
20:06:09 <smooge> if there are people who are waiting for sponsorship shoot me an email and a ticket you want to look at and I will try to get you into the appropriate groups by after break
20:06:20 <smooge> anything anyone wants to say real quick?
20:06:45 <gholms> Thanks for volunteering, folks!
20:06:55 <dgilmore> gday all
20:06:55 <ricky> Welome!
20:06:59 <deathwing01> our pleassure :)
20:07:07 <gholms> That's one word for it.  :)
20:07:24 <smooge> ok next topic
20:07:24 * deathwing01 kills the extra s
20:07:32 <smooge> #topic slushy freeze
20:07:37 <sgallagh> deathwing01: So it's not just a clever name
20:08:19 * skvidal keeps thinking of darkwing duck whenever he sees deathwing01
20:08:33 <gholms> skvidal: That makes two of us.
20:08:35 <smooge> we are going to start a slushy freeze starting this friday afternoon. Basically any changes to puppet or servers needs to get a review on irc/mail and a +1
20:08:47 * goozbach bows belatedly
20:09:52 <CodeBlock> smooge: this friday as in tomorrow, or next friday?
20:10:03 <smooge> this friday as tomorrow
20:10:28 <smooge> the less changes that creep in over the break without someone knowing about them the better.
20:10:44 <CodeBlock> worksforme
20:11:01 <smooge> that way when people are drinking eggnog/tofunog with rum/everclear they aren't doing other things.
20:11:37 <smooge> I will be away from the 26th->2nd. skvidal is similar gone. nirik says he will be around and I think some others will be around every now and then
20:11:53 <CodeBlock> I will be around pretty much that entire time
20:11:55 * dgilmore will be around
20:11:56 * nirik nods. Should be around.
20:11:57 * skvidal will be pageable/callable
20:12:00 <dgilmore> but likely distracted
20:12:13 * waltJ will be around too
20:12:13 * ricky will be around
20:12:14 * deathwing01 will probably be around a lot after Dec 24th
20:12:16 * CodeBlock has nothing to do, so likely won't even be distracted :)
20:12:20 * jsmith will not be around
20:12:37 * mdomsch will be offline most of 12/17-1/4
20:12:47 <dgilmore> jsmith: slacker
20:13:25 <jsmith> dgilmore: It's not like I'm going to get a break -- trust me, it would be less stressful to stick around here :-)
20:13:31 * goozbach will be offline from 12/24 to 12/29
20:13:34 <smooge> ah family
20:13:57 <smooge> ok next topic?
20:14:07 <goozbach> yup
20:14:08 <smooge> #topic Current Outage
20:14:33 * goozbach taps his wristwatch to keep meeting rolling
20:14:39 <goozbach> :)
20:14:43 <smooge> Ok we are currently going through a 'degredation of services' with some items more degraded than others.
20:15:12 <smooge> There may be serveral causes going on and no one factor.
20:15:42 <smooge> 1) our netapp filer is shared with other community projects and is being used more by all.
20:16:10 <smooge> 2) we ran into an issue with EL6 NFS (nfs-utils) that caused background mounting to fail.
20:16:43 <smooge> 3) DNS/host name ichanges caused the filer to not like most of fedora as various ACL caches aged out.
20:17:07 <smooge> 4) its right before I finally go to disneyland for the first time in my life.
20:18:09 <smooge> 5) and someone(me) said "hey we had a quiet weekend on the pager..."
20:18:43 <smooge> so what was impacted: mirror manager, and parts of release engineering
20:18:52 <smooge> puppet and new servers being brought up.
20:18:59 <ricky> Surprisingly wiki images :-)
20:19:16 <skvidal> wiki attachments, too
20:19:16 <smooge> some app servers trying to mount scratch space
20:19:32 <ricky> Er, surprisingly not
20:19:43 <ricky> I never saw the wiki images fail, but they may have at some point
20:19:55 <skvidal> ricky: proxies could been holding them
20:20:03 <ricky> True
20:20:03 <goozbach> so it's an overloaded netapp
20:20:13 <smooge> I think the work people put into haproxy and varnish stopped some things.
20:20:17 <skvidal> goozbach: yes and some dns pain
20:20:20 <goozbach> that isn't owned exclusively by infra?
20:20:35 <CodeBlock> correct
20:21:59 <smooge> ok I don't have much else to say on this other than I hate SATA drive arrays.
20:22:19 <smooge> like I hates the hobbitses
20:22:40 <smooge> any other issues? I missed skvidal or dgilmore  or ricky?
20:23:14 <skvidal> nothing leaps to mind for me
20:23:25 <skvidal> we have a fair bit of clean up to do once the dust settles
20:23:38 <ricky> Probably good to mention the future plans/new netapp next year
20:23:54 <smooge> yeah I thought I was doing well just cleaning up old lvms last week.
20:24:00 <dgilmore> skvidal: dont think so
20:24:22 <dgilmore> i guess we could mention that im moving the lookaside cache  to the equalogics
20:24:57 <smooge> ok so according to plan, we will be moving our sata arrays to a new netapp cluster that should be just us.
20:25:28 <smooge> that will happen in Feb/Mar this year depending on how the schedules break
20:25:48 <smooge> then we will see how things shape up.
20:26:35 <ke4qqq> so based on earlier comments you indicated that in addition to dns changes, that there was an io capacity issue - what changed there, and who changed, and can they stop until we get stuff moved off or?
20:26:57 <goozbach> do we need to add more to a caching layer above?
20:27:45 <skvidal> ke4qqq: jboss merged with xo
20:27:48 <skvidal> err exo
20:27:54 <skvidal> in terms of their repos
20:27:57 <skvidal> andadded a lot of use
20:28:01 <skvidal> that's on the same netapp
20:28:15 <smooge> they also grew their testing and such.
20:28:16 <skvidal> the plan is to split them off - that's what the new netapp stuff is about
20:29:08 <smooge> there are some otehr parts.. and I can after meeting because I have to turn off my Brian Blessed mode in doing so
20:29:31 <skvidal> I have no ides what that means and can't even guess
20:29:42 <skvidal> so what other questions about this clusterfuck do y'all have?
20:29:57 <skvidal> s/clusterfuck/series of unfortunate events/
20:30:00 * dgilmore has none
20:30:21 <CodeBlock> Any approximate ETA until we're 100% again?
20:30:25 <rfelsburg> skivdal: were there any idications that we were going to have a problem before it happened? usage stats etc. can we add monitoring to look for this stuff in the future
20:30:29 <ke4qqq> wow - ok - so I read that as no short term fix - continue degraded til feb/mar?
20:30:32 <skvidal> rfelsburg: yes
20:30:32 <ricky> Hopefully within the hour
20:30:54 <ricky> 100% = things mount and can ls
20:31:11 <smooge> ricky, I was going for that to be 75%
20:31:20 <ricky> s/the hour/nowish/, actually :-)
20:31:26 <skvidal> the dns issue is fixed
20:31:30 <skvidal> so we have the hosts back
20:31:36 <skvidal> but the performance issue may not be fixed
20:31:45 <smooge> 100% will be that ^^^
20:31:46 <ricky> Can we quantify the performance issue at all?
20:31:53 <ricky> How much slower is it?
20:32:16 <skvidal> ricky: the timeouts on app## are one of the issues we're talking about in performance
20:32:17 <smooge> ricky, a good guess will be that app07 does not see drops on /vol/fedora every 3 minutes
20:32:21 <skvidal> app03 and app07
20:32:27 <Jeff_S> y/win 20
20:32:30 <ricky> Ah, didn't know about those.
20:32:43 <skvidal> Jeff_S: nice password there
20:33:00 * CodeBlock assumes he was just switching irc windows
20:33:11 <Jeff_S> skvidal: yeah, that's for root@baseurl.org
20:33:18 <skvidal> Jeff_S: nice
20:33:21 <Jeff_S> sorry for the noise :)
20:34:04 <smooge> ok I think we can go to meeting tickets
20:34:09 <skvidal> let's do that
20:34:12 <dgilmore> ok
20:34:51 <smooge> #topic Tickets
20:35:03 <smooge> .ticket 2502
20:35:03 <zodbot> smooge: #2502 (Retrace Server) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2502
20:35:30 <smooge> Ok this is a project for development/qa on analyzing coredumps from willing participants.
20:35:57 <smooge> I think dennis and I know about the same amount on it... which is not much
20:36:21 <smooge> I spent a good portion of last week trying to find disk space for them and put a temp/test/oh-god server on telia1
20:36:40 <dgilmore> smooge: my understanding is that they plan to make it so debuginfo is available without needing to install the debuginfo rpms
20:36:55 <smooge> at this point I consider it to be not much different from a publictest instance.
20:37:10 <dgilmore> so that abrt/coredump reports etc will all be useful
20:37:25 <smooge> dgilmore, oh I thought it was that you uploaded your core files and they did the analysis there.
20:37:38 <dgilmore> smooge: not my understanding
20:37:42 <dgilmore> but i could be wrong
20:37:56 <skvidal> it does stuff with debugging - is it important that we know?
20:38:18 <dgilmore> its not
20:38:19 <smooge> long term it has security implications and throughput implications.
20:38:28 <dgilmore> lets move on to the next ticket
20:38:30 <skvidal> yay
20:38:34 <smooge> and uses a but load of diskspace
20:38:44 <smooge> .ticket 2501
20:38:45 <dgilmore> ie all debuginfo rpms
20:38:46 <zodbot> smooge: #2501 (What will it take to upgrade fedorahosted to RHEL6, new trac, new git?) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2501
20:39:10 <smooge> ok ke4qqq and deathwing01 (there is no darkwing here)(
20:39:14 <CodeBlock> ke4qqq and deathwing01 have been working on that on publictest03 a bit
20:39:48 <ke4qqq> yeah - so right now we are focused on test plan
20:40:01 <ke4qqq> https://fedoraproject.org/wiki/User:Ke4qqq/Trac_test_plan
20:40:08 <ke4qqq> #link https://fedoraproject.org/wiki/User:Ke4qqq/Trac_test_plan
20:41:18 <deathwing01> yup
20:41:37 <CodeBlock> ke4qqq: how much testing has been done, and how much still needs to be done?
20:41:51 <smooge> I want to thank you guys on that.. and hope we can extend those plans onto other apps/systems.
20:42:01 <dgilmore> hows it looking so far?
20:42:15 * deathwing01 thinks there's still a lot to to
20:42:15 <ke4qqq> so far it's not bad - still lots of testing to go
20:42:20 <smooge> that way when we are doing an update to a server class we can test a checklist versus my current "well the links worked and I could log in"
20:42:41 <ke4qqq> right - and hopefully have it adopted by QA
20:42:47 <ke4qqq> for trac updates in fedora
20:43:44 <smooge> ok thanks on that any more questions?
20:44:10 <smooge> .ticket 2275 CodeBlock et al
20:44:10 <zodbot> smooge: Error: '2275 CodeBlock et al' is not a valid integer.
20:44:14 <smooge> .ticket 2275
20:44:15 <zodbot> smooge: #2275 (Upgrade Nagios) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2275
20:44:21 <smooge> CodeBlock, and others?
20:44:35 <CodeBlock> Nagios is running on noc01.stg (which is EL6)...
20:44:47 <CodeBlock> I still need to move zodbot and others over to noc01.stg before we can kill noc01
20:45:23 <marchant> i accessed the stg nagios 3 system
20:45:25 <CodeBlock> jds2001: told me last night that he moved supybot-fedora to EPEL6, so ... I should be able to do that now
20:45:36 <ricky> Why not just rebuild noc01 entirely instead of moving stuff over
20:45:58 <skvidal> ricky: +1
20:46:07 <dgilmore> ndeed
20:46:11 <smooge> I would like us to rebuild when we have tested
20:46:21 <smooge> not move. sorry if I miscommunicated that
20:46:41 <CodeBlock> smooge: oh? We're not just going to rename noc01.stg to noc01 later?
20:46:57 <smooge> no .. noc01.stg will be around for testing changes in the future and such
20:47:09 <CodeBlock> oh.. heh, I didn't know that
20:47:10 <skvidal> no - if the install of nagios doesn't work from a reinstall then we can't really use it
20:47:38 <ricky> CodeBlock: This might also be a good opportunity to make nagios into a puppet module if you're interested :-)
20:47:47 <goozbach> +1 on sustainability
20:47:49 <CodeBlock> ricky: It was a thought ;)
20:47:59 <smooge> oooooooh
20:48:00 <goozbach> +100 on puppet module
20:48:18 <ricky> I have a skeleton of one in my own puppet setup if that'd help
20:48:20 <smooge> actually it would be useful for the many people who wanted to help out to see about doing that
20:48:49 <goozbach> document it as an SOP as well :)
20:49:04 <goozbach> 12mins left
20:49:09 <CodeBlock> smooge: I could maybe work with phuzion on it
20:49:26 <smooge> well first lets work on getting it into a proper module in staging and then we can move to the next stage of a rebuild of noc01.stg to make sure it all works and then a rebuild of noc01
20:49:42 <ricky> exactly what I was thinking :-)
20:49:44 <CodeBlock> alright
20:49:45 <smooge> ok next ticket
20:49:58 <smooge> .ticket 2481
20:49:59 <zodbot> smooge: #2481 (Fedora switching from the CLA to FPCA) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2481
20:50:04 <smooge> Ok this is a MAJOR ONE
20:50:30 <smooge> I think abadger1999 was working on this?
20:51:20 <smooge> anyway.. the point of htis is that the current CLA will be replaced with new 'paperwork' and everyone will have to reagree
20:51:36 <smooge> this requires fas changes, and some flag days
20:51:51 <ricky> So all of our auth plugins hardcoding cla_done need to be fixed - this is what we get for hardcoding configuration :-)
20:52:01 <smooge> we are needing to get this done by F15 release
20:52:18 <gholms> That soon?
20:52:28 <ricky> I don't think files are a huge deal - we don't delete on inactivation for fedorapeople, which is the only system that should really be affected
20:52:32 <smooge> so I would like ot have all the hard stuff done at/byend of FudCon
20:53:20 <sijis> i think just did the check against cla_*.. i *may* be OK
20:54:14 <smooge> then after that people (FPL and such) can announce the more political flag days
20:54:22 <smooge> does that sound good?
20:54:33 <dgilmore> ricky: content would get moved away
20:54:44 <ricky> On fedorapeople, it just gets chmodded
20:55:00 <dgilmore> we move it to /home/fedora.bak
20:55:07 <ricky> Not anymore
20:55:08 <dgilmore> at least we did
20:55:14 <dgilmore> ok
20:55:19 <dgilmore> news to me
20:55:25 <dgilmore> guess i did not pay attention
20:55:53 <smooge> we need to document better :). I thought we did an rm --real --fast
20:56:17 <smooge> ok anything else on this? skvidal ricky ?
20:56:34 <smooge> .ticket 2277
20:56:35 <zodbot> smooge: #2277 (Figure out how to upgrade transifex on a regular schedule) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2277
20:56:40 <skvidal> not from me
20:56:55 <goozbach> four mins till cloud guys kick us in the shins under the table
20:56:57 <smooge> this is something that comes up every release.. and would be nice for people to try and figure out
20:57:06 * gholms grins evilly
20:57:07 <smooge> I brought a 2x4 this time
20:57:15 <ricky> There has been talk about getting a representative of l10n on the sysadmin team
20:57:19 <ricky> (a long time ago)
20:57:27 <ricky> Someone more familiar with the internals of transifex
20:57:39 <smooge> ok we still need that. I will put it on my list to find out and talk with them after break
20:57:46 <smooge> then one last ticket
20:57:49 <smooge> .ticket 2500
20:57:50 <zodbot> smooge: #2500 (Discuss possibility of FreeIPA as FAS replacement) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2500
20:57:58 <smooge> sorry for just 3 minutes guys
20:57:58 <ricky> This next version will be the hardest because of a big architecture change
20:58:11 <smooge> oh you mean transifex
20:58:12 <ricky> Heh, this was the one I was most interested in :-)
20:58:22 <smooge> sorry I will make it higher next week
20:58:57 <goozbach> what is the timeline on FAS->FreeIPA migration
20:58:58 <smooge> anyone?
20:58:59 <goozbach> ?
20:59:00 <ricky> So we've been talking about kerberos to avoid typing passwords forsuand stuff.
20:59:06 <goozbach> what do we need to change?
20:59:15 <ricky> goozbach: We're not that far yet, no timeline yet.
20:59:32 <ricky> My main question is - what does freeipa give us over openldap + kerberos?
20:59:48 <goozbach> ricky: from what I can tell, ease of administration
20:59:55 <ricky> freeipa seems pretty heavy to me, at least - I know it has a great python API which we could use, but I think there's slightly less flexibility with custom schema
21:00:17 * rbergeron eeks in for a cloud meeting
21:00:25 <smooge> 2 minutes please sorry
21:00:28 <rbergeron> np
21:00:55 <smooge> freeipa mainly gets us a local upstream to help on issues.
21:00:59 <dgilmore> ricky: i think the benefit of using freeipa over bare ldap kerberos is that we could interact with a python api
21:01:15 <goozbach> so we need a feature list of FAS and a feature list of FreeIPA written up
21:01:16 <abadger1999> mmcgrath looked into this before.  (and sgallagh wants to get us to freeipa now).
21:01:18 <dgilmore> rather than having to develop tools to interact with each service seperatly
21:01:25 <goozbach> and a cost/benifit analisis
21:01:39 <abadger1999> I think that we still have issues with using kerberos for two domains so I'm not sure if we can implement kerb yet.
21:01:39 <ricky> So openldap would give us LDAP, which people could query directly against
21:01:56 <ricky> abadger1999: I'm on multiple realms fine, I just have a script which switches my credentials cache
21:02:07 <ricky> I don't thik that's a huge issue anymore.
21:02:12 <goozbach> freeIPA also does host managment
21:02:15 <ricky> So, who wants to discuss in #fedora-admin?  :-)
21:02:19 <goozbach> ease of admin on that side
21:02:21 <abadger1999> ricky: k.  Does that work with firefox too?
21:02:28 <smooge> ok will move to #fedora-admin
21:02:31 <goozbach> +1 for moving to fedora-admin
21:02:39 <abadger1999> Yeah, we should move and let the cloud sig get on with their meeting
21:02:41 <smooge> thankyou for the extra 3 minutes
21:02:43 <smooge> #endmeeting