19:00:01 #startmeeting Infrastructure (2013-06-20) 19:00:01 Meeting started Thu Jun 20 19:00:01 2013 UTC. The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:01 Useful Commands: #action #agreed #halp #info #idea #link #topic. 19:00:01 #meetingname infrastructure 19:00:01 The meeting name has been set to 'infrastructure' 19:00:02 #topic welcome y'all 19:00:02 #chair smooge skvidal CodeBlock ricky nirik abadger1999 lmacken dgilmore mdomsch threebean 19:00:02 Current chairs: CodeBlock abadger1999 dgilmore lmacken mdomsch nirik ricky skvidal smooge threebean 19:00:24 who all is around for a nice meeting? 19:00:25 * relrod here 19:00:26 * threebean is here 19:00:31 * LoKoMurdoK here 19:00:35 Holla! 19:00:50 * skvidal is here 19:01:07 * lmacken 19:01:31 ere 19:02:12 ok, lets go ahead and dive in... 19:02:15 #topic New folks introductions and Apprentice tasks 19:02:25 * mdomsch 19:02:25 any new folks want to say hi, or apprentices with questions or comments? 19:02:36 * abadger1999 here 19:03:24 * nirik waits a minute. 19:03:43 #topic Applications status / discussion 19:03:49 ok, any applications news? 19:04:00 MM upgraded! 19:04:04 #info mirrormanager 1.4 is humming along in production now. :) 19:04:10 thanks for working so much on that mdomsch 19:04:17 Tahrir, badges, etc all humming along. Threebean and I are slaying some code 19:04:20 mdomsch: nice blog post on it too :) 19:04:43 yeah, and the badges repo can be found here if anyone wants to poke through it http://infrastructure.fedoraproject.org/infra/badges/rules/ 19:04:44 glad to have finally had the time. My apologies for the crappy overnight pages and debugging sessions all weekend long! 19:04:44 oddshocks: excellent. :) 19:04:58 #info badges work is moving rapidly ahead 19:05:08 * pingou late 19:05:18 oddshocks / threebean: any milestones coming up or things others could help with? 19:05:26 oh I got something app related (after badges) 19:05:27 mdomsch: no worries. It was worth it to get it all sorted out. :) 19:05:36 oh, a question on badge-land. in the next few weeks, can we set up the production badge machines even while we're still in freeze? 19:06:04 mdomsch, we need to do that every now and then to remind of us why we don't like to have them 19:06:14 we could, just would need a freeze break explaining exactly what is being done. 19:06:30 what is the badges production setup look like? 19:06:35 cool. we're not there yet, but when we do get there I'll write it up 19:06:44 are we doing a separate db server for it or relying on db01,etc? 19:07:05 skvidal: two "webapp" nodes and one "backend" node. db usage should be pretty light.. I was anticipating adding it to db01. 19:07:23 * nirik really hopes to work on our db 'story' at flock some 19:07:38 nirik: +1 19:07:48 yeah, that story right now is "how many eggs do you think we can fit in this basket?" 19:07:54 yeah. :( 19:08:14 * relrod has an idea for a web app that I ran by threebean and puiterwijk a while back, but haven't asked anyone else yet because I haven't had time to work on it - but could maybe get some thoughts: 19:08:16 we have some mitigation of a db server dying, but it could be/should be much better. 19:08:55 * nirik waits for relrod. 19:09:01 As far as badges milestones, we're going to be seeing major front-end changes now that we've got the backend all groovy, and there are plenty of issues open peeps can help with on the github repo, just ping me or threebean about em if ya want :) 19:09:26 Basically it would be neat/handy for some of our apps (e.g. Fedora Packages and Fedora Mobile) if there was a way for package maintainers to upload screenshots of their packages, and we could show them on package info pages. This could also play into if we ever move to an app-store style system in Fedora, it'd be nice if you could see screenshots before you install an app. 19:09:51 relrod: you should talk with hugshie about that 19:10:10 #info badges has a number of open issues, see oddshocks or threebean if you want to help out implementing. 19:10:29 yeah, we probibly need to discuss app store stuff at flock? 19:10:37 http://ambre.pingoured.fr/thisweekinfedora/ 19:10:46 I could see screenshots and/or screencasts being nice to have for that 19:10:47 pingou: will do. It's pretty low priority on my list, but just a thought I had. I don't have time to do much with it yet, but if it's something that not everybody hates then I'll make a repo and at least hack on it once in a while ;) 19:11:38 relrod: same story with me and fedmsg-notifications :) 19:11:54 The app store story is still unclear to me. I know hugshie was working on it again, but we need to determine what of that falls on us to implment and if it makes sense to do things that way. 19:12:04 * threebean nods 19:12:24 nirik: well afaiu fedora-tagger lift already a good part of the burden 19:12:27 * skvidal recuses himself from this discussion 19:12:32 only icons and screenshots are missing 19:12:43 pingou: ok. 19:12:49 he unfortunately never responded to my pings after we launched the rewrite of fedora-tagger. that was supposed to enable app store stuff integration but its not being taken advantage of as far as I can tell. 19:12:50 any other application type news? 19:13:12 http://ambre.pingoured.fr/thisweekinfedora/ <- results of tuesday work 19:13:43 pingou: very nice! 19:13:48 tatica said she was interested in theming it and I hope to plug it onto the planet one of these days 19:14:20 sounds great. ;) 19:14:28 thats all fedmsg based right? 19:14:43 datagrepper actually 19:15:30 yeah, ok. 19:15:45 #info http://ambre.pingoured.fr/thisweekinfedora/ hopefully themed and added to planet soon 19:16:09 any other app news? or shall we move on? 19:16:53 I have to put some pressure on abadger1999 to release python-fedora with the flask openid plugin by the end of freeze :) 19:17:02 19:17:05 It will be done :) 19:17:17 and hopefully not buggy as hell ;-) 19:17:18 is that needed for fedocal openid? 19:17:19 or ? 19:17:35 yeah -- fedocal openid should wor with the beta package + hotfix in infra 19:17:38 nirik: yes, that's the last change I have on the todo before releasing 0.2.0 19:17:51 but it won't work with just hte packages in the fedora/epel repos 19:18:12 ok. 19:18:16 question on flask and openid in general 19:18:32 is there a status on the groups/teams work 19:18:46 the group extension? 19:18:52 so we can do group-limited openid logins to websites via flask or mod_auth_openid 19:18:53 pingou: yes 19:19:10 yeah, I know puiterwijk was waiting for some package for that, but I don't know current status 19:19:11 so I can say 'allow openid from fedora who are in this group' 19:19:21 we also want it for the trac openid on hosted. 19:19:22 thx - in the long term I think it will really help us 19:19:27 nirik: yes we do 19:19:40 skvidal: afaik for the moment the app is responsible for that 19:19:42 I'd love to kick mod_auth_pgsql to the curb. ;) 19:19:54 pingou: right - but I need some way of requesting which groups a user is in 19:20:00 nirik: it's just nagios, right? 19:20:00 python-openid-teams I think is the package? 19:20:16 nagios and logs I think? or were we going to leave logs alone? 19:20:17 nirik: I think so yes 19:20:22 nirik: I'd like to move logs 19:20:28 but logs is not mod_auth_pgsl 19:20:30 afaik 19:20:33 it's that local htpasswd thing 19:20:37 it would mean that if openid is down we couldn't get to logs, but meh 19:20:41 yeah 19:20:43 nirik: yah we could 19:20:51 nirik: we could either do a fallback or just ssh into the frelling box 19:21:00 well, we couldn't get to html versions of the epylogs via a remote browser. ;) 19:21:07 so, no big deal. 19:21:14 since we would be likely fixing openid anyhow then. ;) 19:21:17 nirik: scp log02:/path/to/file.html . ; firefox file.html :) 19:21:25 yep. ;) 19:21:27 anyhow... 19:21:28 I'm okay with that 19:21:31 #topic Sysadmin status / discussion 19:21:38 lets see... sysadmin items... 19:21:43 skvidal: that's in the extension. 19:21:47 abadger1999: excellent 19:21:51 I went to ansiblefest last week 19:21:54 we have a number of things that are going to be pending after freeze. 19:22:03 and mainly heard about how other people are doing things in their infrastructures 19:22:07 skvidal: you should do a trip report thingie/blog post/etc. ;) 19:22:10 ah 19:22:16 * skvidal goes back to lurking 19:22:17 :) 19:22:42 so it sounded like we aren't doing anything _too_ crazy from others, right? 19:22:51 nirik: not too bad 19:23:03 I will note that a lot of folks are using it to merge between sets of tools 19:23:13 lots of 'use ansible to orchestrate other systems' 19:23:21 'use ansible to manage merging multiple inventory sources' 19:23:33 'use ansible to make our devs shut up and break their own systems' 19:23:42 and then by far 19:23:47 some really great new modules 19:23:50 that folks have written 19:23:58 mysql_replication was by far the most exciting imo 19:24:09 their able to flap their master back and forth 19:24:14 orchestrated in a playbook 19:24:21 very nice. 19:24:24 and it is kinda seriously neat 19:24:29 * nirik wishes for a similar postgresql one. ;) 19:24:29 yah - I'd like to be there 19:24:34 on both of our db servers 19:24:48 okay I'll be quiet now :) 19:25:06 and start write the email ;) 19:25:21 #info ansible migration continues new machines are in ansible and slowly we are porting things over to it. 19:25:27 one thing I'll bring up 19:25:34 skvidal: so what are our next steps on ansible migration? 19:25:47 so 19:25:54 in order to allocate access to folks 19:26:02 2 or 3 folks at the conference who spoke are using jenkins 19:26:10 to allocate access to non-root users to run ansible 19:26:19 so they setup jenkins to access the right keys 19:26:28 and then give the users in jenkins access to run certain scripts 19:26:41 then the user sees the whole output and jenkins logs it, etc 19:26:48 so if they break it - they can fix it, too :) 19:27:03 neat 19:27:09 I have to admit I like that - but running jenkins as part of the admin process feels...... weird to me 19:27:15 it's a nice idea (althought I think jenkins is not a solution for us, but perhaps one of the other things like it would be) 19:27:26 right 19:27:31 so my wonder is this 19:27:48 if we did RBAC via a script + your user/groups + sudo on lockbox 19:27:56 that let you see the output AND it captured the output 19:28:00 1. is that ridiculous? 19:28:13 2. is there something stupid about this I'm not thinking about? 19:28:35 http://paste.fedoraproject.org/19948/71747790/ <-- that's the basics 19:29:03 I think that would work... but in addition I think we still would need a trigger based thing... so even if you couldn't run the playbook then, you could commit and let it trigger off. 19:29:10 nirik: triggers STILL have to happen 19:29:11 yes 19:29:14 nirik: no dispute 19:29:23 yep. In agreement here then. :) 19:29:35 about triggers then 19:29:39 I was looking at our commits 19:30:05 and so man y of them would either actually be 'apply-global' or they are the kind that make it almost impossible to know what systems would definitely be impacted - which means 'apply-global' 19:30:09 so in the apply-global case 19:30:27 how often do we check for that 19:30:40 since clearly it cannot be 'apply-global immediately following any check in that could impact it' 19:30:48 that would just be a recipe for people making bad commit decisions 19:30:55 so, that would run all playbooks on all affected hosts in those playbooks? 19:31:08 yeah 19:31:24 apply-global means 'we dcannot tell which hosts are affected so we have to assume all' 19:31:54 yeah. 19:32:03 I guess I was thinking 19:32:14 * nirik needs to ponder on this a bit more I think... I think we can come up with something... 19:32:16 in the abscence of any commits we should not run 19:32:29 that seems fairly obvious, right? 19:32:33 yeah, that sounds reasonable. 19:32:42 or if we did, it would be a nightly job or something. 19:32:56 and it should SCREAM that something changed. 19:33:18 nirik: sfromm's ansible-report can produce email reports - which are kinda nice 19:33:46 I talked to him a bit last week and earlier this week - there's some.... tricks with it - but I think they can be worked around 19:33:52 okay so 19:33:57 let's say that a commit comes in 19:34:00 we could put some of the burden on the committer... to indicate what host(s) or groups to run on... but thats not ideal I guess. 19:34:09 we look up which hosts it applies to 19:34:15 we write that out to a file 19:34:22 which a cron job checks for.... how often? 19:34:33 30min? 19:34:33 every hour? every half-hour? every 15 minutes? 19:34:54 so every 30 m the job will open up the file, look for which playbooks/hosts to run against and run those 19:34:58 if it exists 19:35:27 * nirik notes we can adjust down the road if it seems like the time isn't right... 19:35:40 IMHO if it's urgent you want to run the playbook yourself or find someone to. 19:36:06 if it's just a cleanup/whatever you want it to run and do those things before you forget about it, but with enough time to realize if you made a mistake/typo (or for other people to) 19:36:23 okay 19:36:52 so even more than before - this means if you're playbook is NOT idempotent - you need to make it be so 19:36:53 :) 19:37:05 yes, I think we should strive to make them all so. 19:37:34 that would help for a nightly run too... then if something changed, we know someone futzed with things. 19:38:16 okay 19:38:28 #info working out how to let folks run specific playbook runs themselves. 19:38:39 so one more thing on the triggered runs 19:38:46 #info working out how triggering playbook runs should work. 19:39:00 here's what i've been basing from 19:39:05 for determining which hosts 19:39:18 if the playbook is in playbooks/[hosts,groups] 19:39:21 then it is obvious and simple 19:39:30 yeah, --list-hosts will show it. 19:39:41 (for that easy case with a playbook) 19:39:48 if it is a task - then look up which playbooks that task shows up and run those playbooks 19:39:59 if the file is in $files then apply-global 19:40:15 I can probably make the files thing a BIT more specific - but it's kind a crapshoot :) 19:40:26 if the modification is in groupvars/hostvars - obvious rules apply 19:40:45 roles would be like tasks? (ie, look up which it's included in)? 19:40:51 yah - that's the idea 19:41:25 so it's any mod/add/rm 19:41:33 does that make no sense to anyone? 19:41:54 makes sense to me. I think there may be corner cases, but we can deal with them. 19:42:06 the paths i've not looked at are the plugins and library, handlers, etc 19:42:29 we change those so rarely, just apply-global as a first cut 19:42:33 IMHO 19:42:34 and finally people making 'inventory' changes makes things.. tricky 19:42:56 since it is impossible to know if the inventory change is substantive :) 19:43:00 yeah. 19:43:10 okay that's all I have 19:43:12 I am working a couple of tickets. Will be installing another PPC once it is racked and stacked and I get a kernel fix so I can run KVM sessions again 19:43:21 and cases like if you remove a playbook, do nothing, etc. 19:43:21 if anyone is interested in any of this stuff - please let me know 19:43:39 #info ansible help welcome. See skvidal. :) 19:43:48 smooge: cool. 19:44:01 KVM on PPC is not possible yet 19:44:01 nirik: removing a playbook - yes - would be doing nothing - but.... that's a little different ;) 19:44:06 so it will be LPAR 19:44:12 smooge: :( bummer. 19:44:29 so other pending sysadmin items: 19:44:42 #info new bladecenter is all arrived, trying to get network to it. 19:45:00 #info will be adding mem and replacing 2 bvirthosts sometime this quarter 19:45:16 #info phx2 on-site visit time probibly in mid/late july. 19:46:04 #info working on new rdiff-backup setup for backups, probibly after freeze 19:46:25 Anything else on the sysadmin side of things? 19:46:46 * skvidal has nothing else 19:46:50 #topic Private Cloud status update / discussion 19:46:57 not much cloud news recently. 19:47:05 well - sorta 19:47:10 openstack renamed more random crap 19:47:12 I have some f19-tc5 images loaded in, need to update them to tc6. ;) 19:47:16 yeah, boo. 19:47:30 nirik: hopefully we can get 03 moved to an external port 19:47:35 nirik: and we can do multihost grizzly setup 19:47:49 #info moving networking on a spare node so we can do a multihost grizzly setup 19:48:03 #info f19 test compose images available for testing/tweaking. 19:48:21 #topic Upcoming Tasks/Items 19:48:21 https://apps.fedoraproject.org/calendar/list/infrastructure/ 19:48:33 any upcoming tasks/items folks would like to schedule or note? 19:49:29 * nirik is crossing fingers for an on time release, but you know how it is... ;) 19:49:34 #topic Open Floor 19:49:52 anyone have items for open floor? 19:49:57 questions? comments? suggestions? 19:50:59 nirik: I have access issues to work after meeting 19:51:00 where is everyone today? 19:51:00 * nirik will close out the meeting in a minute if not 19:51:17 ok 19:51:22 kc4zvw_: ok. Can try and assist over in #fedora-admin 19:51:32 skvidal: not sure. 19:52:42 Thanks for coming everyone! 19:52:44 #endmeeting