19:00:05 <nirik> #startmeeting Infrastructure (2013-01-31)
19:00:05 <zodbot> Meeting started Thu Jan 31 19:00:05 2013 UTC.  The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:05 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
19:00:05 <nirik> #meetingname infrastructure
19:00:05 <zodbot> The meeting name has been set to 'infrastructure'
19:00:06 <nirik> #topic welcome y'all
19:00:06 <nirik> #chair smooge skvidal CodeBlock ricky nirik abadger1999 lmacken dgilmore mdomsch threebean
19:00:06 <zodbot> Current chairs: CodeBlock abadger1999 dgilmore lmacken mdomsch nirik ricky skvidal smooge threebean
19:00:20 * skvidal is here
19:00:21 <nirik> who all is around
19:00:23 * puiterwijk here
19:00:24 * Smoother1rOgZ 
19:00:31 * athmane is around
19:00:31 * lmacken 
19:00:32 * Adran is here
19:00:43 * maayke is here
19:00:52 * abadger1999 here
19:01:04 * pingou is here
19:01:05 * SmootherFrOgZ 
19:01:38 <nirik> cool. :)
19:01:43 <nirik> welcome everyone
19:01:44 * mdomsch 
19:01:45 * threebean 
19:01:48 <nirik> #topic New folks introductions and Apprentice tasks.
19:02:01 <nirik> any new folks like to say hi, or questions on any appentice type things?
19:02:11 <ashengmz> Hi, I'm Ashen Gomez from Sri Lanka.
19:02:32 <smooge> here
19:02:40 <nirik> welcome ashengmz. Are you interested in more sysadmin type tasks, or application development type tasks?
19:02:57 <ashengmz> I am more into application development.
19:03:24 <nirik> cool. ;) Do hang out in #fedora-apps and we can point you to things to look at to get started.
19:03:39 <ashengmz> great
19:03:45 <nirik> again welcome. ;)
19:03:45 <ashengmz> I'll hang around
19:03:53 <nirik> any other new folks? or questions?
19:04:03 <nirik> moving along then...
19:04:06 <nirik> #topic Applications status / discussion
19:04:09 <ashengmz> Thanks for the welcome.
19:04:17 <nirik> what news on the applications front this week/upcoming?
19:04:23 * pingou worked on copr
19:04:32 <skvidal> yay
19:04:53 * puiterwijk got FAS-OpenID into staging (finally), but without theming for now (design ticket is waiting and waiting)
19:04:54 <pingou> skvidal: the cli ;)
19:04:57 <skvidal> pingou: I know
19:05:02 * SmootherFrOgZ still waiting on abadger1999's commit to release fas
19:05:04 <skvidal> pingou: I read the email - I've been a bit busy ;)
19:05:17 <pingou> skvidal: sure :)
19:05:23 <nirik> cool.
19:05:31 <nirik> mdomsch: whats the mirrormanager 1.4 news? ;)
19:05:31 <abadger1999> skvidal: commit is now in I htink but there's some other things that we have hotfixed kludgily in production.
19:05:35 <abadger1999> err
19:05:37 <abadger1999> SmootherFrOgZ: ^
19:05:43 * marcdeop is here
19:06:09 <abadger1999> SmootherFrOgZ: I've been trying to get enough time to get the rest of hte hotfixes merged but something else keeps catching on fire :-/
19:06:14 <SmootherFrOgZ> abadger1999: nods. I may have a fix to add which prevent a 500 from editing group
19:06:16 <mdomsch> mm 1.4 is in staging now.  I've been finding and fixing bugs.
19:06:21 <abadger1999> SmootherFrOgZ: k.
19:06:34 <nirik> cool.
19:06:36 <SmootherFrOgZ> abadger1999: no worries. let me know if you need anythong from me
19:06:38 <abadger1999> SmootherFrOgZ: oh -- is it okay with you if we migrate the git repo to github?
19:07:00 <nirik> threebean enabled fedmsg for koji and planet recently I know. ;)
19:07:01 <pingou> mdomsch: cool
19:07:22 <mdomsch> last night's firedrill was unexpected, I need to finish root cause analysis on it to be sure 1.4 won't have the same problem.
19:07:25 <threebean> koji is now spewing fedmsg, the planet is too.  python-fedora and bodhi+fedmsg fires are in the process of getting put out.
19:07:32 <pingou> hm guys, that we have a copy of the git repos on github is nice, but can we make sure it remains a *copy* ?
19:07:34 <SmootherFrOgZ> abadger1999: sure thing
19:07:37 <abadger1999> Cool.
19:07:56 <nirik> mdomsch: yeah. Want to avoid that happening again for sure.
19:08:17 <threebean> good news -> it looks like packages' connection leak on memcached04 is finally fixed -> http://bit.ly/WDgtc6
19:08:18 <puiterwijk> if anyone could test FAS-OpenID and give suggestions back to me, I would be very grateful
19:08:33 <pingou> threebean: great!
19:08:35 <lmacken> yay
19:08:35 <nirik> #info fedmsg enabled for planet and koji.
19:08:43 <nirik> #info mirrormanager 1.4 in staging being tested
19:08:55 <nirik> #info fas-openid is ready for some testing in staging
19:09:09 <threebean> puiterwijk: fas-openid looks great :p  I think we've just got to start having some test-worthy services tap into it
19:09:09 <nirik> pingou: you mean set th emaster repo to the hosted one?
19:09:28 <puiterwijk> threebean: yeah, agreed. I am open for suggestions where to start? ;)
19:09:43 <threebean> (tagger :p)
19:09:44 <puiterwijk> (thanks btw :))
19:09:54 <pingou> nirik: however people like, but I still like the idea to have our repo in our infra :)
19:09:56 <puiterwijk> threebean: hehe, want me to look into porting that?
19:10:18 <nirik> pingou: I'd be happy with that setup too, but my understanding is we have to specially request it.
19:10:23 <threebean> puiterwijk: only if you have time, but that would be great!  Let's talk about it more in #fedora-apps later.
19:10:38 <puiterwijk> threebean: I have time enough right now ;)
19:10:52 <nirik> I can look into it some more.
19:10:52 <pingou> nirik: I asked, they say no to me, maybe worth asking again
19:11:05 <nirik> I can contact the person who said they could do it. ;)
19:11:17 <pingou> cool
19:11:47 <puiterwijk> threebean: just ping me in #-apps whenever you want to talk about it;)
19:11:52 <nirik> ok, anything else on the apps front?
19:11:54 <threebean> cool, cool.
19:12:04 * lmacken still poking at our logstash + elasticsearch cluster
19:12:07 <lmacken> http://logstash-dev.cloud.fedoraproject.org:5601
19:12:13 <lmacken> got fedmsg going in there, and mirror logs
19:12:13 <skvidal> nirik: more hands on coprs are welcome
19:12:18 <Adran> nirik: not sure if its the right time, but there was a ticket about a search engine thati've been playing with.
19:12:22 <puiterwijk> skvidal: I would be happy to help you to?
19:12:25 <nirik> #info coprs assistance welcome.
19:12:27 <skvidal> nirik: we have a pile of feature creep^H enhancements
19:12:42 <skvidal> that need more time.
19:12:46 <puiterwijk> s/you to/you too/
19:12:47 <skvidal> puiterwijk: take a look at the code
19:12:51 <nirik> Adran: sure. We have tried a few times and always run into problems, but if you would be willing to lead another charge at it...
19:12:58 <puiterwijk> skvidal: github or hosted?
19:13:02 <skvidal> puiterwijk: hosted
19:13:18 <skvidal> puiterwijk: when you want to talk about the backend/frontend servers yell at me
19:13:35 <Adran> nirik: Been playing with it, wouldn't mind taking charge on it. I have two solutions for it, one is custom (using a python search engine system) and then another using a pre made package. Both I'm testing locally right now.
19:13:37 <puiterwijk> skvidal: can I yell now? ;)
19:13:44 <skvidal> puiterwijk: not in here :)
19:13:47 <nirik> Adran: we detemined that dpsearch was the best option a while back, but then it got to crawling and got unwanted junk and got really slow.
19:13:53 <puiterwijk> skvidal: sure ok :)
19:14:20 <Adran> nirik: I could play with data park some more sure, my understanding was it wasn't really updated anymore.
19:14:22 * relrod here, late
19:14:22 <nirik> Adran: perhaps update the ticket with your findings and we can go from there. (get you a test instance, etc)
19:14:37 * lmacken becoming a fan of elasticsearch these days... once you can get past the JVM resource issues
19:14:41 <Adran> nirik: Sure.
19:15:06 <nirik> Adran: relrod would be a good one to talk with too, he worked on the dpsearch last time. ;)
19:15:12 <nirik> welcome relrod. ;)
19:15:50 <nirik> #info Adran to look at search again.
19:15:58 <nirik> ok, anything else on the apps front?
19:16:07 <puiterwijk> one thing
19:16:18 <puiterwijk> maybe announce the URL for stg fas-openid?
19:16:23 <nirik> sure...
19:16:39 <puiterwijk> #info url for testing fas-openid: <username>.id.stg.fedoraproject.org
19:17:00 <nirik> yep. You got that issue with it not being able to talk to fas solved?
19:17:30 <puiterwijk> no, temporarily I have set it to use prod fas (that works), and will reset it to stg fas when the firewall is fixed
19:17:46 <nirik> ah, ok, I will try and look at that soon. :)
19:17:55 <nirik> ok, moving along then.
19:17:55 <puiterwijk> ok, thanks :)
19:18:02 <nirik> #topic Sysadmin status / discussion
19:18:21 <nirik> so we did mass updates this week. They went reasonably ok.
19:18:30 <skvidal> and now we get to do more!
19:18:40 <nirik> and yeah, glibc update comes out today. ;(
19:18:45 <nirik> typical. ;)
19:18:55 <nirik> we had a nasty mirorrmanager outage last night.
19:18:55 <pingou> thank you Murphy!
19:19:22 <nirik> mdomsch: would it be easy to modify the scripts to keep the previous pickle? so we could quickly roll back if there was an issue?
19:20:01 <skvidal> pingou: I am beginning to think that our reboots are actually triggering glibc and kernel updates being issued
19:20:19 <nirik> uncoming we also have a bunch of new arm machines to setup for the arm secondary arch folks.
19:20:21 <smooge> skvidal, me too
19:20:30 <pingou> skvidal: that's plausible explanation indeed
19:20:36 <nirik> also, hopefully we can use a few of them for infra stuff as a test too.
19:20:49 <pingou> ansible on arm ?
19:20:55 <skvidal> pingou: it's just sshd
19:20:56 <nirik> sure, should work just fine.
19:20:58 <skvidal> pingou: ansible is just python
19:21:12 <skvidal> pingou: if we cannot get python working on the arm builders
19:21:16 <skvidal> pingou: we're in a bad way
19:21:22 <nirik> the two more important things would be: arm (if there's weird arch things) and fedora instead of rhel. ;)
19:21:24 <pingou> skvidal: I know ;-) it was more meant at: will we/do we want to use ansible for this task?
19:21:31 <skvidal> pingou: yes - we will
19:21:37 <nirik> yes, def ansible for these.
19:21:41 <skvidal> pingou: I spoke with dgilmore about it this morning
19:21:42 <pingou> cool
19:21:43 <bconoboy> ansible should be fine
19:21:44 <nirik> puppet scaling... is poor
19:22:06 <nirik> dumping another 96 hosts in it is not a good plan. ;)
19:22:37 <skvidal> speaking of that
19:22:37 <Adran> How is Ansible migration / conversion / implementation going if I might ask?
19:22:42 <skvidal> Adran: yes
19:22:44 <skvidal> exactly
19:22:49 <skvidal> we discussed it a bit at fudcon
19:22:53 <skvidal> the long and short is
19:23:03 <skvidal> we have to disentangle a bunch of stuff from puppet before we can be away from it
19:23:11 <skvidal> in the case of builders and cloud instances that's easy
19:23:14 <skvidal> b/c they are not entangled
19:23:21 <skvidal> but, imo, our biggest issue is nagios
19:23:37 <skvidal> we have to get our nagios config OUT of puppet and either into its own repo or autogenerating
19:23:40 <nirik> yeah, we need to look at our global and base puppet stuff and see whats worth translating over to ansible
19:23:47 <nirik> and yeah, nagios.
19:24:02 <skvidal> nirik: the base/global stuff worries me less since so much of it is a one-off execution
19:24:17 <skvidal> but nagios is ALWAYS changing
19:24:18 <dgilmore> skvidal: ideally autogenerating
19:24:28 <skvidal> dgilmore: you will hear no argument from me
19:24:34 <nirik> yeah, I will check on check_mk and see if we can move forward with that some soon.
19:24:34 <skvidal> nirik: perhaps we need to do to nagios
19:24:39 <skvidal> nirik: what we did with dns
19:24:51 <skvidal> move it out of the puppet git repo - make its own repo
19:24:55 <skvidal> that we can check and maintain
19:24:57 <nirik> perhaps. I'd prefer to just get it so it doesn't need all that config.
19:25:08 <skvidal> well - that's what I meant by like dns
19:25:21 <skvidal> auto-generating and checkable before committing
19:25:37 <skvidal> right now our dns systems pull the git repo for dns
19:25:41 <skvidal> before applying it locally
19:25:45 <nirik> perhaps. I'm not sure that solves the problem enough
19:25:49 <skvidal> nirik: how so?
19:26:14 <nirik> well, if we get it almost all auto, it shouldn't need a seperate repo... it just happens as part of deployment.
19:26:25 <nirik> anyhow, it def needs doing.
19:26:37 * nirik will try and find time to come up with a plan. Or if anyone else wants to, feel free.
19:26:39 <skvidal> I suspect that even if we automated it as much as possible - we'd still have some manual configs
19:26:48 <skvidal> nirik: the only reason I mentioned a separate repo
19:26:49 <skvidal> is this
19:26:54 <skvidal> it lets someone play in a brand new pool
19:27:01 <skvidal> w/o worrying about ansible or puppet or anything
19:27:14 <skvidal> and if that encourages someone to work on it
19:27:19 <nirik> sure
19:27:36 <skvidal> we have an inventory of our systems
19:27:44 <skvidal> and if need be we can provide all the necessary hostnames
19:27:51 <skvidal> and generate lots of deps between systems
19:28:25 <skvidal> anyway - in order  to get things moved - I think we have to get nagios out of puppet - that's all I was saying  when we started discussing this
19:28:26 <skvidal> sorry
19:28:26 <nirik> #info nagios needs reworking.
19:28:30 <nirik> yeah
19:28:35 <nirik> anything else sysadminy?
19:28:54 <nirik> #topic Private Cloud status update / discussion
19:29:06 <skvidal> updates - and restarting the cloud(s)
19:29:08 <nirik> I'd like to look at scheduling an update/reboot cycle for the cloudlets.
19:29:10 <nirik> yeah.
19:29:18 <nirik> later next week, or perhaps early week after?
19:29:29 <skvidal> early week after - for 2 reasons
19:29:36 <SmootherFrOgZ> what's the status of cloud02?
19:29:36 <skvidal> if we're going to bounce the euca cloudlet
19:29:42 <skvidal> I'd like to move it to 3.2.X
19:29:47 <nirik> yeah. agreed.
19:30:04 <nirik> SmootherFrOgZ: so, it's being used for some things... qa is testing a bit on it.
19:30:13 <nirik> tflink: if you are around, hows the cloud stuff looking?
19:30:21 <nirik> we still need to hook it up to ansible.
19:30:52 <SmootherFrOgZ> nirik: saw that
19:30:55 <skvidal> nirik: the only issue is the ec2-api over ssl, right?
19:31:00 <nirik> yeah.
19:31:06 <SmootherFrOgZ> but still, what's left to be done?
19:31:12 <nirik> SmootherFrOgZ: you have time to poke at that?
19:31:25 <SmootherFrOgZ> nirik: I will next week
19:31:44 <nirik> cool. that would be great. Happy to provide background and info on the current setup
19:32:09 <SmootherFrOgZ> we also need to hook ansible.
19:32:16 * SmootherFrOgZ looks at skvidal
19:32:23 <nirik> #info will be doing a reboot of cloudlets the 12th/13th sometime.
19:32:24 <skvidal> you mean to the cloud servers themselves?
19:32:49 <skvidal> SmootherFrOgZ: then yes - we need to make a few decisions there, actually
19:32:50 <nirik> oh yeah, we should be able to fasClient and 2 factor sudo the cloudlet servers now.
19:32:52 <SmootherFrOgZ> yeah... just like you did with euca
19:33:02 <skvidal> SmootherFrOgZ: I didn't ansiblize them to fas
19:33:21 <skvidal> nirik: is that the plan? I'm fine with it - I just thought you were still debating it
19:33:28 <SmootherFrOgZ> also, what should we do with syadmin-cloud?
19:33:43 <skvidal> nirik: if the plan is to make those fas+2fa - then I will work on making it so
19:33:47 <nirik> yeah, I guess I never said yea or nay, but I think we should go ahead and do that.
19:33:51 <skvidal> nirik: ok
19:33:56 <SmootherFrOgZ> nods
19:33:57 <nirik> and yeah, we could reuse sysadmin-cloud for it
19:34:27 <nirik> there's a bit of a hack needed for 2factor externally.
19:34:38 <nirik> which we can fix if we want some pain, or just not care about. ;)
19:34:38 <SmootherFrOgZ> for the record there's no owner on the group anymore, only some from accounts group can reset it
19:34:52 <tflink> nirik: still haven't had the time to look into it enough, unfortunately
19:35:00 <nirik> tflink: no worries.
19:35:22 <nirik> #info will be setting up fas+2factor on cloud servers
19:35:35 <nirik> #info will look into ec2 ssl so we can hook cloud02 to ansible
19:35:41 <nirik> anything else on cloudy?
19:36:18 <nirik> #topic Upcoming Tasks/Items
19:36:46 <nirik> #info 2013-02-04 to 2013-02-06 Seth out.
19:36:46 <nirik> #info 2013-02-04 pkgdb update.
19:36:46 <nirik> #info 2013-02-18 to 2013-02-19 smooge on site at phx2.
19:36:46 <nirik> #info 2013-02-28 end of 4th quarter
19:36:55 <nirik> anything else folks would like to schedule or note?
19:36:57 <skvidal> nirik: if we don't want to do ec2+ssl on openstack - then I'll need some time to write an ansible nova module - or someone else will need to do it
19:37:06 <nirik> ok
19:38:13 <nirik> oh, might put on there mass rebuild starting 2013-02-08
19:38:52 <nirik> #topic Open Floor
19:38:59 <nirik> ok, any open floor items from anyone?
19:39:43 <smooge> not much from me
19:39:50 <smooge> zilch actually
19:40:09 <nirik> cool. Ok, everyone, thanks for coming!
19:40:19 <nirik> #endmeeting