19:00:05 #startmeeting Infrastructure (2013-01-31) 19:00:05 Meeting started Thu Jan 31 19:00:05 2013 UTC. The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:05 Useful Commands: #action #agreed #halp #info #idea #link #topic. 19:00:05 #meetingname infrastructure 19:00:05 The meeting name has been set to 'infrastructure' 19:00:06 #topic welcome y'all 19:00:06 #chair smooge skvidal CodeBlock ricky nirik abadger1999 lmacken dgilmore mdomsch threebean 19:00:06 Current chairs: CodeBlock abadger1999 dgilmore lmacken mdomsch nirik ricky skvidal smooge threebean 19:00:20 * skvidal is here 19:00:21 who all is around 19:00:23 * puiterwijk here 19:00:24 * Smoother1rOgZ 19:00:31 * athmane is around 19:00:31 * lmacken 19:00:32 * Adran is here 19:00:43 * maayke is here 19:00:52 * abadger1999 here 19:01:04 * pingou is here 19:01:05 * SmootherFrOgZ 19:01:38 cool. :) 19:01:43 welcome everyone 19:01:44 * mdomsch 19:01:45 * threebean 19:01:48 #topic New folks introductions and Apprentice tasks. 19:02:01 any new folks like to say hi, or questions on any appentice type things? 19:02:11 Hi, I'm Ashen Gomez from Sri Lanka. 19:02:32 here 19:02:40 welcome ashengmz. Are you interested in more sysadmin type tasks, or application development type tasks? 19:02:57 I am more into application development. 19:03:24 cool. ;) Do hang out in #fedora-apps and we can point you to things to look at to get started. 19:03:39 great 19:03:45 again welcome. ;) 19:03:45 I'll hang around 19:03:53 any other new folks? or questions? 19:04:03 moving along then... 19:04:06 #topic Applications status / discussion 19:04:09 Thanks for the welcome. 19:04:17 what news on the applications front this week/upcoming? 19:04:23 * pingou worked on copr 19:04:32 yay 19:04:53 * puiterwijk got FAS-OpenID into staging (finally), but without theming for now (design ticket is waiting and waiting) 19:04:54 skvidal: the cli ;) 19:04:57 pingou: I know 19:05:02 * SmootherFrOgZ still waiting on abadger1999's commit to release fas 19:05:04 pingou: I read the email - I've been a bit busy ;) 19:05:17 skvidal: sure :) 19:05:23 cool. 19:05:31 mdomsch: whats the mirrormanager 1.4 news? ;) 19:05:31 skvidal: commit is now in I htink but there's some other things that we have hotfixed kludgily in production. 19:05:35 err 19:05:37 SmootherFrOgZ: ^ 19:05:43 * marcdeop is here 19:06:09 SmootherFrOgZ: I've been trying to get enough time to get the rest of hte hotfixes merged but something else keeps catching on fire :-/ 19:06:14 abadger1999: nods. I may have a fix to add which prevent a 500 from editing group 19:06:16 mm 1.4 is in staging now. I've been finding and fixing bugs. 19:06:21 SmootherFrOgZ: k. 19:06:34 cool. 19:06:36 abadger1999: no worries. let me know if you need anythong from me 19:06:38 SmootherFrOgZ: oh -- is it okay with you if we migrate the git repo to github? 19:07:00 threebean enabled fedmsg for koji and planet recently I know. ;) 19:07:01 mdomsch: cool 19:07:22 last night's firedrill was unexpected, I need to finish root cause analysis on it to be sure 1.4 won't have the same problem. 19:07:25 koji is now spewing fedmsg, the planet is too. python-fedora and bodhi+fedmsg fires are in the process of getting put out. 19:07:32 hm guys, that we have a copy of the git repos on github is nice, but can we make sure it remains a *copy* ? 19:07:34 abadger1999: sure thing 19:07:37 Cool. 19:07:56 mdomsch: yeah. Want to avoid that happening again for sure. 19:08:17 good news -> it looks like packages' connection leak on memcached04 is finally fixed -> http://bit.ly/WDgtc6 19:08:18 if anyone could test FAS-OpenID and give suggestions back to me, I would be very grateful 19:08:33 threebean: great! 19:08:35 yay 19:08:35 #info fedmsg enabled for planet and koji. 19:08:43 #info mirrormanager 1.4 in staging being tested 19:08:55 #info fas-openid is ready for some testing in staging 19:09:09 puiterwijk: fas-openid looks great :p I think we've just got to start having some test-worthy services tap into it 19:09:09 pingou: you mean set th emaster repo to the hosted one? 19:09:28 threebean: yeah, agreed. I am open for suggestions where to start? ;) 19:09:43 (tagger :p) 19:09:44 (thanks btw :)) 19:09:54 nirik: however people like, but I still like the idea to have our repo in our infra :) 19:09:56 threebean: hehe, want me to look into porting that? 19:10:18 pingou: I'd be happy with that setup too, but my understanding is we have to specially request it. 19:10:23 puiterwijk: only if you have time, but that would be great! Let's talk about it more in #fedora-apps later. 19:10:38 threebean: I have time enough right now ;) 19:10:52 I can look into it some more. 19:10:52 nirik: I asked, they say no to me, maybe worth asking again 19:11:05 I can contact the person who said they could do it. ;) 19:11:17 cool 19:11:47 threebean: just ping me in #-apps whenever you want to talk about it;) 19:11:52 ok, anything else on the apps front? 19:11:54 cool, cool. 19:12:04 * lmacken still poking at our logstash + elasticsearch cluster 19:12:07 http://logstash-dev.cloud.fedoraproject.org:5601 19:12:13 got fedmsg going in there, and mirror logs 19:12:13 nirik: more hands on coprs are welcome 19:12:18 nirik: not sure if its the right time, but there was a ticket about a search engine thati've been playing with. 19:12:22 skvidal: I would be happy to help you to? 19:12:25 #info coprs assistance welcome. 19:12:27 nirik: we have a pile of feature creep^H enhancements 19:12:42 that need more time. 19:12:46 s/you to/you too/ 19:12:47 puiterwijk: take a look at the code 19:12:51 Adran: sure. We have tried a few times and always run into problems, but if you would be willing to lead another charge at it... 19:12:58 skvidal: github or hosted? 19:13:02 puiterwijk: hosted 19:13:18 puiterwijk: when you want to talk about the backend/frontend servers yell at me 19:13:35 nirik: Been playing with it, wouldn't mind taking charge on it. I have two solutions for it, one is custom (using a python search engine system) and then another using a pre made package. Both I'm testing locally right now. 19:13:37 skvidal: can I yell now? ;) 19:13:44 puiterwijk: not in here :) 19:13:47 Adran: we detemined that dpsearch was the best option a while back, but then it got to crawling and got unwanted junk and got really slow. 19:13:53 skvidal: sure ok :) 19:14:20 nirik: I could play with data park some more sure, my understanding was it wasn't really updated anymore. 19:14:22 * relrod here, late 19:14:22 Adran: perhaps update the ticket with your findings and we can go from there. (get you a test instance, etc) 19:14:37 * lmacken becoming a fan of elasticsearch these days... once you can get past the JVM resource issues 19:14:41 nirik: Sure. 19:15:06 Adran: relrod would be a good one to talk with too, he worked on the dpsearch last time. ;) 19:15:12 welcome relrod. ;) 19:15:50 #info Adran to look at search again. 19:15:58 ok, anything else on the apps front? 19:16:07 one thing 19:16:18 maybe announce the URL for stg fas-openid? 19:16:23 sure... 19:16:39 #info url for testing fas-openid: .id.stg.fedoraproject.org 19:17:00 yep. You got that issue with it not being able to talk to fas solved? 19:17:30 no, temporarily I have set it to use prod fas (that works), and will reset it to stg fas when the firewall is fixed 19:17:46 ah, ok, I will try and look at that soon. :) 19:17:55 ok, moving along then. 19:17:55 ok, thanks :) 19:18:02 #topic Sysadmin status / discussion 19:18:21 so we did mass updates this week. They went reasonably ok. 19:18:30 and now we get to do more! 19:18:40 and yeah, glibc update comes out today. ;( 19:18:45 typical. ;) 19:18:55 we had a nasty mirorrmanager outage last night. 19:18:55 thank you Murphy! 19:19:22 mdomsch: would it be easy to modify the scripts to keep the previous pickle? so we could quickly roll back if there was an issue? 19:20:01 pingou: I am beginning to think that our reboots are actually triggering glibc and kernel updates being issued 19:20:19 uncoming we also have a bunch of new arm machines to setup for the arm secondary arch folks. 19:20:21 skvidal, me too 19:20:30 skvidal: that's plausible explanation indeed 19:20:36 also, hopefully we can use a few of them for infra stuff as a test too. 19:20:49 ansible on arm ? 19:20:55 pingou: it's just sshd 19:20:56 sure, should work just fine. 19:20:58 pingou: ansible is just python 19:21:12 pingou: if we cannot get python working on the arm builders 19:21:16 pingou: we're in a bad way 19:21:22 the two more important things would be: arm (if there's weird arch things) and fedora instead of rhel. ;) 19:21:24 skvidal: I know ;-) it was more meant at: will we/do we want to use ansible for this task? 19:21:31 pingou: yes - we will 19:21:37 yes, def ansible for these. 19:21:41 pingou: I spoke with dgilmore about it this morning 19:21:42 cool 19:21:43 ansible should be fine 19:21:44 puppet scaling... is poor 19:22:06 dumping another 96 hosts in it is not a good plan. ;) 19:22:37 speaking of that 19:22:37 How is Ansible migration / conversion / implementation going if I might ask? 19:22:42 Adran: yes 19:22:44 exactly 19:22:49 we discussed it a bit at fudcon 19:22:53 the long and short is 19:23:03 we have to disentangle a bunch of stuff from puppet before we can be away from it 19:23:11 in the case of builders and cloud instances that's easy 19:23:14 b/c they are not entangled 19:23:21 but, imo, our biggest issue is nagios 19:23:37 we have to get our nagios config OUT of puppet and either into its own repo or autogenerating 19:23:40 yeah, we need to look at our global and base puppet stuff and see whats worth translating over to ansible 19:23:47 and yeah, nagios. 19:24:02 nirik: the base/global stuff worries me less since so much of it is a one-off execution 19:24:17 but nagios is ALWAYS changing 19:24:18 skvidal: ideally autogenerating 19:24:28 dgilmore: you will hear no argument from me 19:24:34 yeah, I will check on check_mk and see if we can move forward with that some soon. 19:24:34 nirik: perhaps we need to do to nagios 19:24:39 nirik: what we did with dns 19:24:51 move it out of the puppet git repo - make its own repo 19:24:55 that we can check and maintain 19:24:57 perhaps. I'd prefer to just get it so it doesn't need all that config. 19:25:08 well - that's what I meant by like dns 19:25:21 auto-generating and checkable before committing 19:25:37 right now our dns systems pull the git repo for dns 19:25:41 before applying it locally 19:25:45 perhaps. I'm not sure that solves the problem enough 19:25:49 nirik: how so? 19:26:14 well, if we get it almost all auto, it shouldn't need a seperate repo... it just happens as part of deployment. 19:26:25 anyhow, it def needs doing. 19:26:37 * nirik will try and find time to come up with a plan. Or if anyone else wants to, feel free. 19:26:39 I suspect that even if we automated it as much as possible - we'd still have some manual configs 19:26:48 nirik: the only reason I mentioned a separate repo 19:26:49 is this 19:26:54 it lets someone play in a brand new pool 19:27:01 w/o worrying about ansible or puppet or anything 19:27:14 and if that encourages someone to work on it 19:27:19 sure 19:27:36 we have an inventory of our systems 19:27:44 and if need be we can provide all the necessary hostnames 19:27:51 and generate lots of deps between systems 19:28:25 anyway - in order to get things moved - I think we have to get nagios out of puppet - that's all I was saying when we started discussing this 19:28:26 sorry 19:28:26 #info nagios needs reworking. 19:28:30 yeah 19:28:35 anything else sysadminy? 19:28:54 #topic Private Cloud status update / discussion 19:29:06 updates - and restarting the cloud(s) 19:29:08 I'd like to look at scheduling an update/reboot cycle for the cloudlets. 19:29:10 yeah. 19:29:18 later next week, or perhaps early week after? 19:29:29 early week after - for 2 reasons 19:29:36 what's the status of cloud02? 19:29:36 if we're going to bounce the euca cloudlet 19:29:42 I'd like to move it to 3.2.X 19:29:47 yeah. agreed. 19:30:04 SmootherFrOgZ: so, it's being used for some things... qa is testing a bit on it. 19:30:13 tflink: if you are around, hows the cloud stuff looking? 19:30:21 we still need to hook it up to ansible. 19:30:52 nirik: saw that 19:30:55 nirik: the only issue is the ec2-api over ssl, right? 19:31:00 yeah. 19:31:06 but still, what's left to be done? 19:31:12 SmootherFrOgZ: you have time to poke at that? 19:31:25 nirik: I will next week 19:31:44 cool. that would be great. Happy to provide background and info on the current setup 19:32:09 we also need to hook ansible. 19:32:16 * SmootherFrOgZ looks at skvidal 19:32:23 #info will be doing a reboot of cloudlets the 12th/13th sometime. 19:32:24 you mean to the cloud servers themselves? 19:32:49 SmootherFrOgZ: then yes - we need to make a few decisions there, actually 19:32:50 oh yeah, we should be able to fasClient and 2 factor sudo the cloudlet servers now. 19:32:52 yeah... just like you did with euca 19:33:02 SmootherFrOgZ: I didn't ansiblize them to fas 19:33:21 nirik: is that the plan? I'm fine with it - I just thought you were still debating it 19:33:28 also, what should we do with syadmin-cloud? 19:33:43 nirik: if the plan is to make those fas+2fa - then I will work on making it so 19:33:47 yeah, I guess I never said yea or nay, but I think we should go ahead and do that. 19:33:51 nirik: ok 19:33:56 nods 19:33:57 and yeah, we could reuse sysadmin-cloud for it 19:34:27 there's a bit of a hack needed for 2factor externally. 19:34:38 which we can fix if we want some pain, or just not care about. ;) 19:34:38 for the record there's no owner on the group anymore, only some from accounts group can reset it 19:34:52 nirik: still haven't had the time to look into it enough, unfortunately 19:35:00 tflink: no worries. 19:35:22 #info will be setting up fas+2factor on cloud servers 19:35:35 #info will look into ec2 ssl so we can hook cloud02 to ansible 19:35:41 anything else on cloudy? 19:36:18 #topic Upcoming Tasks/Items 19:36:46 #info 2013-02-04 to 2013-02-06 Seth out. 19:36:46 #info 2013-02-04 pkgdb update. 19:36:46 #info 2013-02-18 to 2013-02-19 smooge on site at phx2. 19:36:46 #info 2013-02-28 end of 4th quarter 19:36:55 anything else folks would like to schedule or note? 19:36:57 nirik: if we don't want to do ec2+ssl on openstack - then I'll need some time to write an ansible nova module - or someone else will need to do it 19:37:06 ok 19:38:13 oh, might put on there mass rebuild starting 2013-02-08 19:38:52 #topic Open Floor 19:38:59 ok, any open floor items from anyone? 19:39:43 not much from me 19:39:50 zilch actually 19:40:09 cool. Ok, everyone, thanks for coming! 19:40:19 #endmeeting