19:00:00 #startmeeting Infrastructure (2013-02-14) 19:00:00 Meeting started Thu Feb 14 19:00:00 2013 UTC. The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:00 Useful Commands: #action #agreed #halp #info #idea #link #topic. 19:00:01 #meetingname infrastructure 19:00:01 The meeting name has been set to 'infrastructure' 19:00:01 #topic greetings and felicitations 19:00:01 #chair smooge skvidal CodeBlock ricky nirik abadger1999 lmacken dgilmore mdomsch threebean 19:00:01 Current chairs: CodeBlock abadger1999 dgilmore lmacken mdomsch nirik ricky skvidal smooge threebean 19:00:08 * skvidal is here ish 19:00:23 * puiterwijk 19:00:29 * lmacken ish 19:02:06 ok, lets go ahead and dive in then 19:02:08 #topic New folks introductions and Apprentice tasks. 19:02:16 any new folks around? or apprentices with questions? 19:02:21 Hi all 19:02:26 I am new here 19:02:30 * pingou 19:02:48 is here 19:02:52 * abadger1999 here 19:02:56 * Adran is here 19:03:04 nirik: you answered my questions at least most of them this morning though. :) 19:03:11 welcome knesenko. Care to introduce yourself? Are you more interested in sysadmin or application devel stuff? 19:03:16 Adran: cool. ;) 19:03:21 hi, i am new here 19:03:52 welcome Ramesh_ 19:04:16 Hi all. My name is Kiril. I have good exp. in Linux systems, RPM packaging etc .. . I am interested to maintain koji build systems. 19:04:57 Thanks nirik, i was in the infrastructure group before but due to acedemic load i was not able to contribute much 19:05:06 send "Hello" email to the infra email list last week . 19:05:26 knesenko: welcome. not sure how much work the buildsys needs, but I'm sure we can try and find you something to work on... 19:05:44 * ianweller here 19:05:45 Ramesh_: no worries. ;) are you more interested in sysadmin or application devel? 19:05:46 nirik: sysadmin is good as well :) 19:05:48 copr? 19:06:04 * skvidal +1's that 19:06:11 we could use people working on copr 19:06:17 * nirik nods. :) 19:06:29 knesenko: it isn't koji - but it is a different buildsys we're working on 19:06:41 http://fedorahosted.org/copr 19:06:54 yeah i am interested in system admin stuff 19:06:55 knesenko: if you're interested - come by #fedora-apps 19:07:07 cccccccbjhtnjvrvvflvcvngijtcvrefrucenfibfvkb 19:07:11 hey look - my yubikey fired 19:07:12 skvidal: cat? 19:07:15 Ramesh_: cool. See me after the meeting in #fedora-admin and we can get you started. ;) 19:07:19 oh even better, skvidal 19:07:33 skvidal: everything that is related to build systems I am interested :) 19:07:40 nirik: cool :) 19:07:46 knesenko: great - then take a look at what copr is for 19:07:50 knesenko: then you will like copr :) 19:07:52 knesenko: and come by and talk to us if you're interesting 19:07:53 excellent. Any other new folks or general questions? 19:07:58 s/intetresting/interested/ 19:08:12 skvidal: pingou np 10x 19:08:31 #topic Applications status / discussion 19:08:41 ok, any applications news this week or upcoming? 19:08:49 I've started to work on a refresh of pkgdb 19:08:49 yeah, three openid related ones from me 19:09:01 the db scheme will change and is simplified 19:09:40 #info pingou working on a pkgdb update with db schema changes 19:09:49 as you might have read, we have authopenid for trac working now 19:10:05 puiterwijk: we need to package that up still right? 19:10:08 (Seth sent an email out) 19:10:16 nirik: yeah, will do so later today or tomorrow 19:10:32 #info authopenid testing with trac seems to show it works. 19:10:38 I htink we've finally gotten rid of all the bugs in the python-fedora+python-requests update. But we're running one hotfix in infrastructure relatedto that. Unless we find more serious bugs, I'll hold off on another upstream release until after fedora/epel6 get the update that's currently pending. 19:11:00 abadger1999: nice 19:11:02 second OpenID related item: flask_fas_openid base is done, I only need to implement one last extension (CLA), and then we can start moving over Flask apps to FAS-OpenID 19:11:18 * abadger1999 thanks threebean for tracking down a solution for that last problem. 19:11:23 abadger1999: is that one hotfix something others will hit when the current update goes stable? or it's likely only something that matters to us? 19:11:50 nirik: Yes, but I'm hoping it won't be as bad for them. It's a *severe* performance regression. 19:11:57 #info flask-fas-openid base is almost ready for use. Flask apps can then use fas-openid 19:12:04 as a heads up, the new API for bugzilla has been deployed to partner-bugzilla. the new python-bugzilla (git master HEAD) is needed to interface with the API changes and has changed a bit 19:12:20 * tflink sent an email to infrastructure@ but figured it would make sense to mention here 19:12:28 tflink: yeah. ;( we need to test our stuff... 19:12:30 If you download a large json dataset from one of our apps, then python-requests attempts to detect the character encoding which will take a very long time. 19:12:49 also, FAS-OpenID 0.5 will get into staging too when I have the CLA extension working, and when it is I will request testing on the infra mailing list 19:13:09 it broke a bunch of stuff in the blocker tracking app but I needed to refactor that code anyways 19:13:15 #info partner-bugzilla has been updated. git HEAD python-bugzilla needed. We need to test all our bugzilla using applications against the new versions. 19:13:18 We're mostly the ones that are doing that since we cron jobs/supybot that consume data about all of our packages/users/etc. 19:13:55 tflink: thanks for the heads up 19:14:10 * relrod here, late. 19:14:21 what about a pre-release python-bugzilla 19:14:28 nirik: np, we still don't know when the changes are going to be pushed to production, though 19:14:35 * tflink is waiting to hear back on that 19:14:46 pingou: we should make one or ask for one, yeah 19:14:56 pingou: I asked for one, should be done today sometime 19:15:20 tflink: nice! 19:15:24 thanks 19:15:34 excellent. 19:16:01 askbot in stg should be good to go. 19:16:12 threebean: cool. when does it message? 19:16:21 oh, and another request with respect to FAS-OpenID: if you tested it, please report at least failures, but also successes would be nice 19:16:24 asking new questions, proposed answers? 19:16:35 #info feedback wanted on FAS-OpenID 19:16:47 #info askbot in stg is fedmsg aware now 19:16:51 nirik: new question, proposed answer, and a few others (flagged messages as offensive) 19:17:04 ok 19:17:44 ok, any other applications news? 19:17:58 I need to move on with fedocal 19:18:03 abadger1999: when did you want to move that pkgdb update? :) 19:18:12 I'll have the spec file ready by this week-end 19:18:32 pingou: cool. Happy to help you with review, etc 19:18:35 nirik: I'm debating waiting for pingou's schema changes to land now. 19:18:47 ok 19:18:54 abadger1999: no go ahead 19:19:02 nirik: It's pretty close and it would be nice to bring the db size down.... Should make backups faster :-) 19:19:09 abadger1999: that retires the apps part and we already clean up the db a bit 19:19:19 ok. 19:19:23 and I've got plenty of other releases/fedora spec file cleanups to work on :-) 19:19:30 pingou: ah. Okay. 19:19:48 nirik: tentatively end of next week, then. 19:20:00 abadger1999: I see more the changes I'm doing atm for a new(er) version of pkgdb 19:20:06 k 19:20:09 ok, we don't need an exact schedule right now... just enough in advance so we can schedule any outages. 19:20:17 19:20:30 ok, shall we move on then? 19:20:36 nirik: Shouldn't be any outage but there will be a change in what people can get from pkgdb afterwards 19:20:45 appdb goes away, tags from pkgdb go away. 19:20:58 #topic Sysadmin status / discussion 19:21:03 * abadger1999 makes a note to check if rel-eng processes are pulling tags from pkgdb or tagger now 19:21:11 abadger1999: yeah, good to check on. 19:21:20 so, lets see... on the sysadmin side... 19:21:34 #info Rework of nagios underway. Phase 1 complete. 19:21:44 I redid nagios dependencies 19:21:50 so they should now be right. 19:21:58 and we shouldn't get 20 pages when a site is down 19:22:35 next phase(s) are to change the alerts so we only get urgent on impacting outages, and fix it so we can have other groups we monitor get notices for their stuff only. 19:22:56 side note: can we switch nagios to use openid too? it's using mod_auth_pg as well right now 19:23:10 nirik: can add to my todo-list? 19:23:15 puiterwijk: that would be great. 19:23:46 we had a nasty serverbeach outage eariler in the week as well as some hosted instability. 19:23:55 #action puiterwijk will look into switching nagios to openid from mod_auth_pg 19:24:22 smooge is going to be doing a quick visit out to our phx2 datacenter next week. 19:24:37 If there are things people can think of that can only be done on-site, please let him know. ;) 19:24:43 yeah, hosted was probably gluster not playing nice 19:24:59 puiterwijk: all we really need for nagios (and for epylog logs) is some way of doing normal apache auth to openid 19:25:05 I also adjusted hosteds robots.txt 19:25:12 I will be mostly working on getting some hardware rebuilt 19:25:19 skvidal: so apache mod_openid? 19:25:26 #info smooge on site next week (mon/tue). 19:25:40 puiterwijk: is that actually being maintained? 19:25:41 #info robots.txt on fedorahosted adjusted to prevent crawling load issues. 19:25:50 skvidal: I have no idea yet :) 19:26:07 nirik: want to go over the cloud upgrade 'fun' 19:26:08 ? 19:26:12 skvidal, nirik: just one issue though... 19:26:13 yeah, next up... 19:26:20 puiterwijk: ? 19:26:30 if we would switch nagios to use openid, and openid would go down, we would also lose access to the web interface of nagios... 19:26:36 puiterwijk: agreed 19:26:41 true. 19:26:53 but if postgres goes down, we already die. ;( 19:26:54 it's one reason why I like the basic auth we're using for epylog 19:26:57 nirik: ^^^ 19:27:03 yeah. 19:27:04 we just use a .htpasswd file 19:27:10 that is... unlikely... to fail 19:27:18 yeah, quite 19:27:34 just wanted to note it before I go to deep into mod_auth_openid 19:27:42 puiterwijk: it's a good note 19:27:45 well, just a thought. Let me ponder on it some more. Ideally I'd prefer nagios (and epylogs too) to be available to lots of people, but just not crawlers or the world... 19:28:01 nirik: maybe multiple-auth? 19:28:04 nirik: fallthrough? 19:28:11 possibly... 19:28:11 nirik: maybe failback ? 19:28:13 it depends on how mod_openid fails 19:28:18 yeah, not sure off hand. 19:28:29 and, again, if it is reliable or maintained 19:28:31 I will check into mod auth_openid 19:28:34 openid, then if fails a fallback .htpasswd with core people. 19:28:41 nirik: ya 19:28:48 could work. 19:29:02 ok, any other sysadmin stuff? or shall we move on to cloudy fun? 19:29:15 #topic Private Cloud status update / discussion 19:29:25 so, we updated both our cloudlets this week. 19:29:35 The openstack one seems to have gone pretty well. 19:29:41 have to head to vet 19:29:41 the euca side... not so well 19:29:42 bbs 19:29:49 the euca cloudlet.... had issues 19:29:49 smooge: safe travels 19:30:04 I've been helped by the euca team today to determine what went wrong 19:30:30 one issue was caused by my running: service eucalyptus-cc stop instead of service eucalyptus-cc cleanstop 19:30:36 the other issue is unclear. 19:30:51 apparently 'cleanstop' is similar to --really-force 19:30:53 stop means 'uncleanly shutdown' ? thats odd... 19:31:00 nirik: well stop means 19:31:05 'stop but maintain all state' 19:31:11 ah. 19:31:12 cleanstop means stop but nuke all state and start fresh 19:31:29 which you want on upgrades... 19:31:33 umm 19:31:38 that's where it gets weird 19:31:57 I've asked this question if there is ever a time on an upgrade (where you would NOT want to cleanstop) 19:32:00 and the answer is no 19:32:08 so, since the upgrade modifies the db 19:32:19 I am unclear on why it doesn't clean the state then 19:32:26 yeah. ;( 19:32:41 anyway... I have some more tests to run but as of this moment things are more or less working in the euca cloudlet 19:32:45 so where do we stand right now? instances are up and working, but the cloudlet is not stable/reliable? 19:32:53 instances are up and working 19:32:59 volume attachment is what I have to test 19:32:59 fedocal seems to have some data corrupted 19:33:04 pingou: corrupted? 19:33:06 pingou: that's new 19:33:09 pingou: corrupted where? 19:33:10 files of size 0 19:33:17 skvidal: /srv/persist/fedocal 19:33:30 pingou: that's new. 19:33:36 pingou: what files? 19:33:43 all I think 19:33:49 pingou: can I look? 19:33:53 of course 19:33:54 * nirik plays the sad trombone. ;( 19:34:18 skvidal: just don't mind the cat sitting on /srv/ 19:34:27 pingou: yes 19:34:32 anyway 19:34:39 the euca cloudlet is not a happy place right now 19:34:54 at the same time I am trying to get the ssl'd ec2 api interface working for openstack 19:35:00 all the packets are getting through 19:35:05 but we are seeing this error 19:35:14 2013-02-14 19:23:54 WARNING [keystone.common.wsgi] Authorization failed. EC2 signature not supplied. from 127.0.0.1 19:35:21 #info cloudlet upgrades this week. openstack did ok, euca didn't upgrade nicely at all 19:35:53 it seems like some sort of signature is getting stripped out (or never supplied) with the ec2 auth 19:35:56 so, if we get that working we can I hope spin up the volumes/instances in the OS side and reinstall the other cloudlet if we like. 19:36:03 if we do not pass through nginx this works 19:36:06 so it may be happening there 19:36:12 nirik: agreed 19:37:13 #info work ongoing to get ec2 ssl working on openstack cloudlet 19:37:22 ok, anything further cloudside? 19:37:52 not at the moment 19:37:56 #topic Upcoming Tasks/Items 19:38:07 lets see if this floods me off irc: 19:38:15 #info 2013-02-18 to 2013-02-19 smooge on site at phx2. 19:38:15 #info 2013-02-28 end of 4th quarter 19:38:15 #info 2013-03-01 nag fi-apprentices 19:38:15 #info 2013-03-07 remove inactive apprentices. 19:38:15 #info 2013-03-19 to 2013-03-26 - koji update 19:38:17 #info 2013-03-29 - spring holiday. 19:38:19 #info 2013-04-02 to 2013-04-16 ALPHA infrastructure freeze 19:38:20 #info 2013-04-16 F19 alpha release 19:38:22 #info 2013-05-07 to 2013-05-21 BETA infrastructure freeze 19:38:24 #info 2013-05-21 F19 beta release 19:38:26 #info 2013-05-31 end of 1st quarter 19:38:28 #info 2013-06-11 to 2013-06-25 FINAL infrastructure freeze. 19:38:30 #info 2013-06-25 F19 FINAL release 19:38:32 anything else anyone would like to schedule or add/note? 19:38:43 * nirik notes as soon as we have fedocal I can just stick this all in there. ;) 19:38:49 nirik: :) 19:38:55 Not sure if this would fit in meeting related topics, but there is a PyCon booth we'd like to give to Fedora if anyone is attending PyCon 19:39:09 sontek: awesome. ;) 19:39:11 nirik: maybe we should start thinking about planning to move FAS-OpenID to prod? 19:39:27 abadger1999 / lmacken / threebean I think are going to pycon? possibly more from our team? 19:39:56 puiterwijk: yeah, I'd like to have it in prod. 19:40:11 nirik: any idea for what kind of schedule would fit? 19:40:33 abadger1999: was the FAS login(username, password, yubikey) function written already? 19:40:44 nirik: Yeah -- I'll be there. I'm trying to hook sontek up with one of the ambassadors who are attending as they'll be quite happy to make use of it I think. 19:40:47 well, not sure. I'm open to ideas. 19:40:54 abadger1999: excellent. 19:41:11 abadger1999, nirik: or do we want to hold of on yubikey in the first version of FAS-OpenID for now? 19:41:55 Jesse Noller just wants to know who will be doing the booth so he knows it wont be empty, I can definitely help out at the booth but wouldn't want to be the main person for it 19:42:07 well, if it's not going to be much longer to just implement it would be good to get done before prod. If it will be a while, I'm fine waiting for another release later. 19:42:17 puiterwijk: It wasn't written yet. No reason to hold back on it assuming we can get it written. 19:42:49 abadger1999: what do you mean? to just release it without for the moment, and add it in 1.1? 19:43:04 (or 2.0, in the Firefox version-hell spirit :)) 19:43:28 yeah, if it will be a while I'm fine with a 1.0 now and future release with that. 19:43:39 #topic Open Floor 19:43:45 anyone have any items for open floor? 19:43:52 questions? suggestions? 19:43:58 I had, I just have to remember what it is 19:44:11 :) 19:44:24 puiterwijk: I would say, don't let lack of yubikey block us from releasing but If we can write the fas method you need in time, there's no reason to hold it back either 19:44:52 * nirik is in agreement with abadger1999. as usual. ;) 19:45:27 abadger1999: okay. I will check quickly how much time it'll take 19:45:34 nm, it doesn't want to come back, if I remember it, I'll mention it 19:45:40 no worries. 19:46:22 ok, if nothing more will close out in a few.. 19:46:32 oh, forgot to mention arm boxes... 19:46:49 nirik: I was just going to ask about that :-) 19:47:12 we have arm boxen in phx2. Racked. Powered. Serial console all setup. 1 of the 4 of them have ip's and I've installed the 24 socs for arm builders. 19:47:31 need to reinstall them all, then do ansible config on them, then they should be ready to build. 19:47:39 network is still pending for the other ones. 19:47:51 Any idea how long the network for the others might take? 19:48:08 (Don't get me wrong -- I'm happy to see the first 24 coming online) 19:48:16 Just curious, is all 19:48:41 I've been harrassing people for a week+. ;) Our primary network contact is out, so we are trying to get someone else up to speed on the setup and requirements. 19:48:56 hopefully soon, but I just don't know for sure. ;) 19:49:01 Fair enough 19:49:53 ok, thanks for coming everyone. As always, continue in #fedora-admin, #fedora-noc and #fedora-apps 19:49:57 #endmeeting