18:00:01 #startmeeting Infrastructure (2012-10-11) 18:00:01 Meeting started Thu Oct 11 18:00:01 2012 UTC. The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:01 Useful Commands: #action #agreed #halp #info #idea #link #topic. 18:00:01 #meetingname infrastructure 18:00:01 The meeting name has been set to 'infrastructure' 18:00:01 #topic Aloha! 18:00:01 #chair smooge skvidal CodeBlock ricky nirik abadger1999 lmacken dgilmore mdomsch threebean 18:00:01 Current chairs: CodeBlock abadger1999 dgilmore lmacken mdomsch nirik ricky skvidal smooge threebean 18:00:06 * skvidal is here 18:00:10 * abadger1999 here 18:00:20 * jds2001 18:00:21 * dgilmore is here but not for long dinner calls 18:00:26 * lmacken 18:00:31 * Spack here 18:00:36 * ianwelle1 is sorta here 18:00:50 * _love_hurts_ here 18:00:58 * pingou here 18:01:16 * relrod_webchat here 18:01:21 cool. Nice crowd today. ;) 18:01:29 lets go ahead and dive in 18:01:31 #topic New folks introductions and Apprentice tasks 18:01:45 any new folks? or apprentices wanting to ask about a task/etc? 18:02:23 * nirik doesn't see any new folks off hand... 18:03:00 #topic Applications status / discussion 18:03:20 any application news or status this week or upcoming? ;) 18:03:31 * herlo is here too 18:04:08 lots of fedmsgery hacking :) 18:04:27 I've not forgotten about the FAS release, going update staging one last time to latest translations, and target updating prod shortly after the beta freeze, if possible (no date in mind yet) 18:04:28 yeah, fedmsg is trucking along. ;) 18:04:57 #info Beta freeze now scheduled to start 2012-10-16 18:05:16 pkgdb update went out but b/c release was pushed out a week, working to see if I can get another update out with two other features. 18:05:45 also, going to shoot for a python-fedora release... but that'll likely be after freeze (push to F/EPEL and then update infra after freeze) 18:05:55 * pingou kept working on having a full SA FAS 18:06:18 fedocal's backend is 100% unit-tests covered 18:06:20 there's going to be some hacking on infra apps at fudcon paris -- courtesy of pingou 18:06:29 sorry late but here 18:06:56 pingou: awesome. I'm planning on digging in on features this weekend. I assume you are going to be busy with FUDCon, so I'll get you feedback in email, unless there's a trac instance or something. 18:07:09 herlo: email is fine 18:07:16 k, perfect 18:07:35 excellent. Was great to close all those hotfix tickets. Great work abadger1999! 18:07:35 anything else we hope to land before freeze? 18:07:36 * nirik can't think of anything off hand. 18:07:36 ok, if no more application devel news will move on... 18:07:36 herlo: until we know what we want to do with it (wrt to insight) I won't make it something official with a trac and all 18:07:50 oh, right. 18:07:53 yikes. net lag 18:08:05 I'll be picking up fedorahosted app dev soon again, too 18:08:14 pingou: I'm leaning toward your options btw. I think it could be integrated with the look and feel of insight if necessary. 18:08:17 nice relrod_webchat 18:08:20 should be ready to test soon now that flask is in epel as of a few weeks ago 18:08:35 herlo: but insight planned to have calDav iCal from the start 18:08:40 right 18:08:41 s/planned/plans 18:08:53 pingou: I'm aware of that. I guess we'll see how it goes. 18:08:59 * nirik has seen no actual work on the insight one beyond requirements gathering long ago. 18:09:10 I hope to learn more about it this week-end 18:09:18 nirik: and the GSoC 18:09:24 well we need to make sure someone is taking care of insight 18:09:30 it is currently unattended 18:09:33 smooge: is there? 18:09:36 and not doing much 18:09:47 and next on the chopping block if not taken up again 18:09:59 oh, I guess another application news: I announced the smolt retirement 18:10:24 we need someone to work on the server end to drop submissions and point people to a retirement page before next month. 18:11:01 was patrick working on this? 18:11:06 on what? 18:11:18 the 'fake' smolt server 18:11:21 pingou: he was going to, but seems to have gotten busy. Hopefully he still can. 18:11:44 would be nice 18:11:49 * nirik will try and find out. ;) 18:12:15 #info if puiterwijk can't work on the smolt retirement server, will need to find someone to do so before next month 18:12:24 it should pretty much a couple lines of python 18:12:40 yeah, shouldn't be too bad... 18:12:54 * relrod_webchat can look into it, if he's unable to 18:13:04 it may need some kind of 'ack, I got your submission' thing... or 'ack, please see smoltsretired.html' 18:13:26 anyhow, any other application devel news? or shall we move on? 18:13:39 * relrod_webchat would like to assume the smolt client just checks response code 18:13:51 #topic Sysadmin status / discussion 18:14:13 so, lets see... on the sysadmin side we have been dealing with a anoying routing issue at ibiblio. Thats still ongoing. 18:14:39 I setup the ansible public and private repos this week 18:14:43 I setup cgit on lockbox for our public repos: http://infrastructure.fedoraproject.org/cgit/ 18:15:14 Most virthost13 vms have been moved to vh12, there's still two remaining (app02.stg, proxy01.stg). Will be finishing that migration probably tomorrow or this weekend. The already-moved ones have been pretty painless. 18:15:27 skvidal: cool. Thanks for doing that. 18:15:32 relrod_webchat: ditto. ;) 18:16:04 we had the request for a buildbot node (Fedora and EL) 18:16:15 yeah. 18:16:26 !late but here 18:16:32 shall we move on to cloud to discuss that? 18:16:38 welcome miguelcnf 18:16:40 k 18:16:40 #topic Private Cloud status update 18:17:17 so, what things do we need to do/solve before we move things to more supported on the cloud? 18:17:28 do we want to keep 2 types of cloud? or settle on one? 18:17:44 so I've been looking at the euca 3.1.2 upgrade 18:17:47 I'd like to at least try openstack folsom in the next few weeks. 18:17:56 and I was told to be wary of it w/ running instances 18:18:14 and I confirmed this morning that, if I apply the 3.1.2 pkgs that it will just break the running instances 18:18:38 ok, so everything needs some downtime on there for that... 18:18:44 so to upgrade we'll need to terminate all instances and then upgrade 18:19:06 nirik: yah - it's a quick downtime - but afaict, all instances will need to start over fro mscratch :( 18:19:15 which well, sucks 18:19:16 yeah, welcome to the cloud world. 18:19:26 is that also true in the openstack cloudlet? 18:19:32 on the openstack side you can hibernate them... 18:19:36 or whatever it calls it. 18:19:49 there's a feature request filed for euca to behave more gracefully 18:19:55 on sub minor revision upgrades 18:19:56 do the vm's not exist independently of the management infrastructure? 18:20:01 https://eucalyptus.atlassian.net/browse/EUCA-3663 18:20:07 jds2001: they do exist 18:20:08 "suspend instance" 18:20:10 or does euca do more than i thought 18:20:13 in most cases.. from my experiernce with a cloud.. you can find yourself rebooted 18:20:20 jds2001: but they will be 'lost' 18:20:28 jds2001: meaning it will nuke their ip address 18:20:31 and the routes going to them 18:20:48 in openstack "suspend instance" is like a hibernate... it saves to disk, cpu and mem are freed, then you start it again it resumes and takes back cpu/mem, etc. 18:21:04 so, it would loose ip then too. 18:21:07 and regain a new one. 18:21:22 nirik: so it'll be a fun game of 'find my instance' 18:21:23 :) 18:21:27 right. 18:21:31 good times 18:21:36 so, perhaps we shouldn't auto assign external ip. 18:21:52 nirik: manually assignation seems like a pain 18:21:56 make the user request one and add it to the instance, which would be a pain, but you could request whatever one you wanted (if it was free) 18:21:59 if only from a tooling standpoint 18:22:03 yeah. 18:22:12 using dhcp ? 18:22:13 does the instance keep its mac address? can't we use dhcp with that? 18:22:23 abadger1999: it;'s not about dhcp 18:22:28 they have internal and external addresses 18:22:37 the internal one is allocated via dhcp (normally) 18:22:40 the external one is a route 18:22:49 a poor man solution would be to use ddns or something similar 18:22:52 on the network master (or in euca on the cloud controller) 18:23:14 misc: ddns with our current dnssec setup will be.... interesting 18:23:18 do able 18:23:20 but not fun 18:23:20 so, one thing I would like to see before we move to more 'production': some inventory thing that lets us know who to yell at for any instance/keep track of them. 18:23:36 18:23:40 nirik: so - that ties into the authn/authz area 18:23:48 yeah. 18:23:57 the tenants/accounts stuff 18:24:22 right. 18:24:43 so, perhaps we need to be much more narrow what a 'tennant'/account is. 18:25:01 so in both 18:25:07 you have tennats and users 18:25:13 or 'accounts' and users 18:25:21 ie, have 'buildbot for python' and 'buildbot for foo' as seperate completely, since they may have seperate admins/etc. 18:25:27 an account or a tennant is like a company 18:25:34 a billing 'account' 18:25:37 right. 18:25:39 you have a bunch of users under each of those 18:25:44 and ANY of those users 18:25:54 can see and influence any other users' instance under the same account 18:26:10 you can do some tricks with the perms system in euca (and I think in os) to limit that some 18:26:22 also if 2 instances on the same account 18:26:26 but from different users 18:26:31 are using the same security group 18:26:31 yeah, so my first thought is that we could have something like 'short term' and lump a bunch of people in it, but now I think it would be better to give each one their own account/tennant 18:26:32 in euca 18:26:35 they will have the same vlan 18:26:58 nirik: I tend to agree - but with the vlans we have a slight scaling issue 18:27:07 nirik: in that if we have 2000 accounts 18:27:12 we've just eaten up all of our vlans 18:27:13 right. 18:27:18 now 18:27:19 2 things 18:27:24 1. if we have 2000 accts 18:27:33 we have no hope of ever having them all running :) 18:27:35 another mitigation would be that admins can make longer term ones for others.... not give them a account to manage directly. 18:27:51 but then all those in that pool could 'see' each other. 18:27:57 2. if we have 2000 accts we will need more systems - and probably a lot more cluster controllers 18:28:03 nirik: except 18:28:08 if you make a separate security group 18:28:14 each security group gets its own vlan 18:28:22 oh, nice. 18:28:25 for example under the skvidal user on the 'fedora' account 18:28:29 I have 3 security groups 18:28:36 1 default (22, and ping) 18:28:47 2 webservers (22, ping, 80, 443, 8080) 18:28:52 3 wideopen (all open) 18:28:59 each of those is a separate vlan 18:29:04 if I want to make a new one - I can just do so 18:29:16 euca-add-group 18:29:18 yeah. 18:29:27 euca-authorize somestuff 18:29:32 jenkins-slave or whatever. 18:29:39 so - that does make it easy for an admin to isolate them 18:29:44 but it also means 18:29:49 any user in an acct 18:29:52 who can make an instance 18:30:03 can insert themselves into the security group 18:30:09 if they are in the same tennant 18:30:19 that's not a real serious thing, though 18:30:34 since, ostensibly, any user in an acct/tennant should be trusted by the others in that acct/tennant 18:30:41 nirik: have you looked into vlans on openstack, yet? 18:30:59 another thing I would like to have before we move to more production: sop's/clear steps on images... ie, contain minimal, what to do about updates (yum-cron?), etc. 18:31:04 I have not had a chance. 18:31:19 I think it behaves per tennat, I don't know about per security group 18:31:46 so about yum-cron, etc 18:31:55 I am inclined to say that for users setting up their own instances 18:31:56 its up to them 18:31:58 but again 18:32:03 this goes back to what we said before 18:32:12 but if it's up to them, they probibly won't do anything. ;) 18:32:14 we do not, currently, have a niceish way to know why which one is there 18:32:30 nirik: fair enough - we could definitely have the default instances be yum-cron on 18:32:45 nirik: so if each user is in their own tenant/acct 18:32:47 * nirik thinks he might like to default to update daily... if you don't want that, it's up to you to turn it off and be responsible 18:32:50 then for a new user setup we need to 18:33:08 1. add their acct 18:33:16 2. add their user in that acct 18:33:21 3. set a temp password 18:33:32 4. add a default secutrity group to that acct (no it doesn't come with one) 18:33:40 5. tell them how to setup their user, etc 18:33:41 (stupidly, openstack has no way for a user to change their password) 18:33:54 6. make that user have admin-level control to their acct 18:34:10 nirik: well euca's password changing thing is in the web interface which is.... 18:34:15 well it's not fabulous 18:34:33 yeah, folsom might have something. I don't recall if they had it fixed or not. 18:34:33 I should be able to script the creation of a user/account using the euca2ools and boto 18:34:43 there was some pushback about that being an 'admin function'. :( 18:34:43 and populate all we need 18:34:57 also for images: should we use cloud-init? 18:34:59 it would be run from fed-cloud01 as the euca-admin 18:35:16 nirik: seems like cloud-init is a good choice - but I know gholms and agrimm have been working on it a fair amount 18:35:19 so it may be in flux 18:35:24 * nirik nods 18:35:30 so in short 18:35:35 there is a lot of stuff to do :) 18:35:42 yes, indeed. 18:35:47 so, on the buildbot request... 18:35:47 nirik: you know what might be handy? 18:35:57 should we do that now? or not until we are more ready? 18:36:02 nirik: a [s]hitlist of things before we can call things 'production' 18:36:08 yeah. 18:36:17 nirik: I'm happy with providing the buildbot - if the consume knows it may go 'poof' at any moment 18:36:20 I cna try and generate such a list. somewhere. 18:36:39 nirik: and to be fair it would help us with our tools for reprovisioining and tracking instances 18:36:57 proposal: setup buildbot instances for twisted and python (if they still want it) as long as they know it could go away if we need it to. ;) 18:37:02 yeah 18:37:09 python folks wanted one a long time ago 18:37:19 my only concern would that, it means we are ok with providing such service to all project asking for it 18:37:25 pingou: no 18:37:27 I do not agree 18:37:35 it means we are ok w/giving it to anyone we CHOOSE to 18:37:41 but just b/c I let my dog in the house 18:37:45 doesn't mean I have to let every dog in 18:37:50 ok 18:38:05 fine for me 18:38:13 pingou: yeah, we should note that it's not a blanket agree. 18:38:13 'case by case' and 'no promises of sla or uptime or that it will even continue to exist' 18:38:13 at least for now. 18:38:28 setting up the nodes shouldn't be too hard I think 18:38:31 nirik: I agree with everything but the last sentence 18:38:32 nirik: +1 18:38:37 I think if we have capacity and can work out things it might be a good community thing to offer to projects. 18:38:51 yeah, we can't help the world... 18:39:10 if they want the world - then ask aws for the freebie instances there 18:39:24 18:39:32 What do you mean by going 'poof'? (twisted buildbot maintainer here) 18:39:37 tomprince: going off 18:39:42 tomprince: needing to be killed for a downtime 18:39:46 or b/c sometimes things blow up 18:39:52 or b/c the colo evaporates 18:40:04 or magic has been discovered in phoenix and reality is torn asunder 18:40:09 we will of course try and not have that happen... 18:40:14 but that doesn't mean it won't 18:40:22 in other words - uptime is not a commitment 18:40:54 here it is and we will try and make it work as well as we can and notify if there's a downtime, etc... but we cannot promise 18:41:08 That seems entirely reasonable. 18:41:23 cool. 18:41:39 #info will look at setting up some buildbot instances to help us test and help projects build. 18:41:51 nirik: I'd like to go ahead and do the euca 3.1.2 upgrade 18:41:56 which means nuking all the instances 18:41:58 (many of our existing slaves are provided by individuals right now, so we don't get any better than that anyway) 18:41:59 and bringing them back up 18:42:02 #action nirik to generate list of production needed items before cloud becomes production 18:42:33 skvidal: sounds good to me. We have a good jenkins playbook now right? so those should just be really easily buildable again. 18:42:35 nirik: wanna start a wiki page or something? 18:42:42 nirik: we have a good one for the slaves 18:42:56 nirik: we do not have one for the master, yet - pingou and I are going to work on that today 18:42:57 I'll try to figure out the master next week or so 18:43:04 pingou: oh ok 18:43:05 or tonight :) 18:43:07 I also had a thought 18:43:10 about that 18:43:14 but we'll discuss it oob 18:43:17 sure 18:43:17 ok. 18:43:33 nirik: if a wiki - I can add some things we need for production-ing the cloud 18:43:36 #action nirik will try and test openstack with vlans and/or no gluster soon 18:43:55 yep. Can do. Or just add it to the private cloud page? I guess it's getting kinda long. 18:44:25 I'll find a place. 18:44:30 Anything else for clouds? 18:44:38 nope 18:44:50 #topic Security FAD update 18:45:16 so, fad is on schedule... we submitted our tenative budget for it... but budget people are all traveling, so haven't heard back yet. 18:45:33 we might want to try and do some prelim planning or the like before then... 18:45:38 nirik: what's the wiki page again? 18:45:54 * nirik thinks we can get the cgi up and sudo working without much hassle, but is more worried about fas changes. 18:46:01 https://fedoraproject.org/wiki/FAD_Infrastructure_Security_2012 18:46:51 so, do look and see if there's things we can hash out before hand on the list or the like. 18:47:13 #topic Upcoming Tasks/Items 18:47:22 #info 2012-10-16 to 2012-10-30 F18 Beta Freeze 18:47:22 #info 2012-10-30 F18 Beta release 18:47:22 #info 2012-11-01 nag fi-apprentices 18:47:22 #info 2012-11-07 - switch smolt server to placeholder code. 18:47:22 #info 2012-11-20 to 2012-12-04 F18 Final Freeze 18:47:22 #info 2012-11-20 FY2014 budget due 18:47:25 #info 2012-11-22 to 2012-11-23 Thanksgiving holiday 18:47:26 #info 2012-11-26 to 2012-11-29 Security FAD 18:47:28 #info 2012-11-30 end of 3nd quarter 18:47:30 #info 2012-12-04 F18 release. 18:47:33 #info 2012-12-24 to 2013-01-01 Red Hat Shutdown for holidays. 18:47:34 #info 2013-01-18 to 2013-01-20 FUDCON Lawrence 18:47:36 freeze next week... unless we slip again. ;) 18:47:40 anything folks want to note or schedule? 18:48:02 #info 2012-10-13 to 2012-10-15 FUDCon Paris 18:48:11 Oh, I might be making a trip on the 25th/26th... will let folks know if that happens. ;) 18:48:24 pingou: :) 18:48:54 #topic Open Floor 18:48:59 any items for open floor? 18:49:51 * nirik listens to the crickets chirp. 18:49:55 not from me 18:50:00 ok, will close out in a minute if nothing more... 18:50:02 nor me. 18:50:11 * smooge smells his lunch and goes.. please please close it 18:50:25 * relrod_webchat heads home from school and +1's smooge :) 18:50:44 ha. Thanks for coming everyone! 18:50:46 #endmeeting