19:00:00 <nirik> #startmeeting Infrastructure (2011-07-28) 19:00:00 <zodbot> Meeting started Thu Jul 28 19:00:00 2011 UTC. The chair is nirik. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:00 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic. 19:00:01 <nirik> #meetingname infrastructure 19:00:01 <zodbot> The meeting name has been set to 'infrastructure' 19:00:01 <nirik> #topic Robot Roll Call 19:00:01 <nirik> #chair smooge skvidal codeblock ricky nirik abadger1999 19:00:01 <zodbot> Current chairs: abadger1999 codeblock nirik ricky skvidal smooge 19:00:07 * skvidal is here 19:00:20 * abadger1999 here 19:00:29 <CodeBlock> hi there 19:00:51 <nirik> hello all. 19:01:14 * nirik waits another minute or so for more folks to wander in. 19:01:21 * herlo is here 19:01:24 <herlo> mostly 19:02:00 <skvidal> herlo: how goes your avian project? 19:02:09 <herlo> skvidal: it goes 19:02:25 <herlo> making decent progress, not as fast as I'd like, but good 19:02:31 <nirik> ok, I guess lets go ahead and get started... 19:02:37 <skvidal> waterfowl are gross, gross creatures. I wouldn't expect dealing with them would be a quick thing 19:02:42 <smooge> I am ready 19:02:44 <nirik> #topic New folks introductions and apprentice tasks/feedback 19:02:47 <herlo> skvidal: haha, yeah 19:02:47 <skvidal> herlo: I wish you luck 19:02:50 <herlo> thx 19:03:10 <nirik> any apprentice folks want to ask questions or note tickets? any new folks want to introduce themselves? 19:04:08 <nirik> I've seen a tapering off of apprentice activity of late... possibly due to schools restarting and people getting busy? or just general summer malaze? 19:04:23 <skvidal> both 19:04:24 <skvidal> I suspect 19:04:29 <skvidal> people chilling out for their summer 19:04:35 <skvidal> and prepping for school 19:04:38 <skvidal> I suspect in oct or so 19:04:49 <skvidal> when people are riding back comfortably in the reigns of school 19:04:53 <skvidal> then we'll see them come back 19:05:03 <nirik> #info If you are an apprentice and want to get (re) involved, look at easyfix tickets and/or chime in on other topics in channel to get something to work on. ;) 19:05:09 <nirik> yeah, could well be. 19:05:14 <skvidal> bonus points if you get the literary reference there 19:05:44 * nirik doesn't off hand. ;( 19:06:23 <herlo> dead poet's society? and I didn't google it, especially since I'm probably wrong :) 19:06:54 <nirik> carpe diem! 19:06:58 <herlo> :P 19:07:02 <skvidal> herlo: ray bradbury - something wicked this way comes 19:07:10 <nirik> anyhow, if nothing else on apprentice tasks, moving on... 19:07:14 <herlo> not much of a literary buff 19:07:18 <nirik> #topic Moving SOP docs from wiki to git 19:07:25 <skvidal> w00t 19:07:35 <smooge> csi? 19:07:36 <skvidal> nirik: so - we just need a decision - put them with the csi docs or put them in their own repo 19:07:42 <nirik> so, there are some advantages and disadvantages here. 19:07:48 <StylusEater> skvidal: put them on github? :-) 19:07:56 <nirik> but I think the advantages outweigh the disadvantages. 19:07:57 <skvidal> StylusEater: <stab> 19:08:06 <skvidal> what are the disads? 19:08:08 * nirik looks at the csi docs repo. 19:08:18 <skvidal> nirik: is csi docs repo even a repo? 19:08:20 <skvidal> or is it just a dir? 19:08:24 <smooge> it is a repo 19:08:25 <nirik> less ability for $otherpeople to correct things/contibute. 19:08:34 <skvidal> smooge: where is it housed/ 19:08:57 <smooge> Most of the stuff under it is git clone git://git.fedorahosted.org/csi.git 19:09:04 <skvidal> okay 19:09:04 <nirik> it's on hosted. 19:09:07 <skvidal> that I didn't understand 19:09:18 <skvidal> I'm not in favor of our sop docs being on hosted 19:09:29 <nirik> yeah, that drops one of the advantages. 19:09:31 <skvidal> right 19:09:41 <smooge> well it could be moved. 19:09:42 <skvidal> and I'd rather have to protect a single basket in the event of disaster 19:09:44 <skvidal> not multiple ones 19:09:45 <abadger1999> yeah 19:10:02 <nirik> it's also using publican I think... 19:10:20 <abadger1999> harder to point $otherpeople at how to do something. 19:10:32 <smooge> Correct. The CSI documents are supposed to be the policies and the SOPs are supposed to be the how to complete the policies 19:11:38 <skvidal> right 19:11:38 <skvidal> so 19:11:42 <nirik> so, if we had a repo on infrastructure, could we have it allow the same groups as the wiki does to edit? cla_done+1 or whatever. 19:11:42 <smooge> The main thing I wanted to was to make sure that we keep both in sync 19:11:43 <abadger1999> think we're agreed.... git repo on lockbox separate from the other git repos males more sense. 19:11:51 <smooge> yes I agree on that 19:12:07 <abadger1999> *makes sheesh, the typos today. 19:12:10 <skvidal> in the event of a disaster 19:12:15 <skvidal> I don't need to see our policies 19:12:18 <nirik> we probibly need to give the CSI docs a good lookover/edit/cleanup someday. 19:12:20 <skvidal> I will need to see our SOPs 19:12:24 <skvidal> cleaning up CSI makes sense 19:12:35 <skvidal> but I wouldn't want to clean it up at the detriment of SOPs being current 19:12:44 <herlo> yeah, that was my question nirik, what is the format going to be, publican? or something else? 19:12:49 <skvidal> herlo: txt 19:12:50 <skvidal> txt file 19:13:02 <nirik> yeah, text would be fine with me. 19:13:06 <skvidal> any system which involves effort to maintain == doom 19:13:09 <skvidal> b/c people will avoid it 19:13:15 <herlo> somethign like rst would be nice 19:13:17 <skvidal> and the SOPs don't have anything that is not text-able 19:13:23 <herlo> and it still is basically text 19:13:30 <nirik> it doesn't need to be pretty, just contain the data. Hopefully so you can cut and paste things. 19:13:35 <herlo> but I supppose we could make that later... 19:13:40 <herlo> yeah 19:13:53 <smooge> I would prefer the SOP's to be in text 19:14:03 <skvidal> nirik: paste++++ 19:14:32 <nirik> so, any objections to that plan? If not, we need to plan the move, convert docs and redirect as we go... I could write up a plan for it. 19:15:03 <skvidal> nirik: I can take the action item to make the repo and get the pushing happening to a path in infra/web 19:15:10 <skvidal> err /srv/web/infra/ 19:15:32 <nirik> skvidal: ok. Thanks. Perhaps we need a 'new infra git repo with hooks' SOP. ;) 19:15:54 <skvidal> nirik: I genericized the hook code 19:16:00 <skvidal> so I don't have to do it a billion times 19:16:04 <nirik> #action skvidal to make new repo 19:16:04 <skvidal> when I setup infra-hosts 19:16:11 <nirik> #action nirik to write up migration plan 19:16:23 <abadger1999> I'm still lamenting lack of TOC and hyperlinks but.... getting it off the wiki is pretty important. 19:16:47 <skvidal> abadger1999: suggestions welcome - but not publican 19:16:52 <nirik> we could make an index.html? ;) 19:16:52 <skvidal> abadger1999: that's A LOT of infrastructure for links 19:16:52 <abadger1999> *shudders* 19:17:36 <nirik> #action nirik will investigate updating CSI docs, or see what we can do to update them. 19:17:41 <abadger1999> I wouldn't wish publkican on this in a million years. 19:18:06 * CodeBlock is back, sorry, had to talk to my boss about a $dayjob project. 19:18:09 <smooge> having done this in the long ago past.. you use a minimal markup at the top to say things like: Title:, Reason: Keywords, and then use a quick txt2html wrapper which makes html files with just <pre></pre>and index.html to link them 19:18:30 * nirik dealt with docbook when he did his howto, would really prefer to avoid that complexity (and publican is another layer on top of that) 19:18:59 <abadger1999> smooge: Yeah, something more like that is what I was envisioning. 19:19:05 <skvidal> nirik: docbook---- 19:19:15 <skvidal> nirik: I've done it there, too - with the nfs-howto - it made me cranky 19:19:17 <abadger1999> smooge: Maybe parse some sort of internal headers as well. 19:19:20 <smooge> basically a checkin rebuilds the stuff and we go 19:19:25 <skvidal> abadger1999: I suspect someone has this 19:19:29 <skvidal> abadger1999: and it is lightweight 19:19:30 <skvidal> and simple 19:19:39 <skvidal> abadger1999: CodeBlock mentioned something 19:19:41 <skvidal> markdown? 19:19:46 <nirik> we can look at implementation details out of band? 19:19:57 <ianweller> markdown is an *okay* language, it's not great 19:20:03 <skvidal> nirik: nod 19:20:04 <smooge> skvidal, only in their 0.1 versions.. then people start asking for features and it becomes an XML/SGML/publican thing in the 0.2 version 19:20:20 <abadger1999> +1 19:20:25 <smooge> yeah offline after meeting 19:20:35 <nirik> I think possibly just adding a link/description to a index.html when you add a SOP could be enough... 19:20:50 <ianweller> nirik: directory listing? 19:21:10 <skvidal> post-meeting 19:21:12 <nirik> ie, "GomGabbar SOP - use this when you want to install a new Gom Gabbar device" 19:21:14 <skvidal> in fedora-admin 19:21:15 <nirik> anyhow, yeah. 19:21:29 * nirik cares not what colour the bike shed is. 19:21:45 <nirik> anything else on this? or shall we move on? 19:21:45 * skvidal is partial to blue-gray 19:21:47 <CodeBlock> ianweller: I like Markdown because when it's not parsed, stuff still stands out/looks good/is distinguishable in plaintext. As in headers are underlined (with - or =) etc. 19:21:58 * CodeBlock shuts up so we can move on :P 19:22:19 <nirik> #topic QA network setup 19:22:34 <nirik> ok, so we talked about this some last week... here's the conclusion I came to: 19:22:56 <nirik> monitoring - use our nagios for monitoring, and have alerts go to sysadmin-qa folks. 19:23:30 <nirik> config management - try out bcfg2 there. This means removing virthost-comm01 and bastion-comm01 from our puppet and adding them into bcfg2 there. 19:24:06 <nirik> I'm undecided if they should have a seperate lockbox-comm01 for bcfg2 or not... I guess so to be carefull. 19:24:17 <nirik> so, if anyone wants to help with that setup, please do. ;) 19:24:44 <skvidal> ok 19:24:55 <smooge> ok will do so 19:25:02 <nirik> once we get it setup, we can re-eval down the road... if bcfg2 isn't working we can switch them out. 19:25:16 <smooge> cfengime 19:25:16 <nirik> they will still need to use our setup for a few things... like repos I suspect. 19:25:48 <nirik> also, in the qa space a few more things down the road: 19:26:25 <nirik> it would be good if we could add some fedora ks files if we want them to use our kickstarts (which I think is probibly easiest, since they have to use our repos anyhow) 19:27:11 <nirik> probibly before too long will be adding some secondary arch signing instances there... which will let us test out the new sigul/setup 19:27:30 <skvidal> heh 19:27:32 <nirik> thats all I had off hand on qa network stuffs. Any questions/concerns/ideas? 19:27:36 * skvidal read that as 'sighing instances' 19:27:49 <nirik> yeah, basically. ;) 19:28:39 <nirik> ok, moving on. 19:29:02 <nirik> #topic Upcoming Tasks/Items 19:29:14 <nirik> So, monday morning we have some reboots... 19:29:29 <nirik> tuesday is the start of f16alpha freeze. 19:30:11 <nirik> we have some new machines hopefully racked or being racked, so they will need installing and adding to monitoring. 19:30:46 <nirik> Any other upcoming items folks would like to note/schedule/ask about/plan for? 19:32:01 <nirik> ok. moving on then. 19:32:09 <nirik> #topic Meeting tagged tickets 19:32:24 <nirik> https://fedorahosted.org/fedora-infrastructure/query?status=new&status=assigned&status=reopened&group=milestone&keywords=~Meeting&order=priority 19:32:30 <nirik> any here folks want to discuss? 19:32:49 * skvidal looks 19:33:06 <smooge> not me 19:33:38 <smooge> 2501 will need us to get hardware up at ibiblio working I believe 19:33:44 <nirik> There's some old stuff there I might close or remove meeting from. 19:33:57 <skvidal> smooge: the hw is there now 19:34:02 <skvidal> when I hear from reuning 19:34:05 <skvidal> I'll go over and hitch it up 19:34:10 <smooge> ok cool 19:34:15 <nirik> well, hosted needs a plan. I might try and tackle that in my copius free time.... 19:34:22 <skvidal> and you shall surely here my cries and curses when the ipv6 stuff doesn't work 19:34:34 <skvidal> nirik: so I had a thought 19:34:35 <skvidal> on hosted 19:34:39 <skvidal> that you may well hate 19:34:43 <skvidal> but I wanted to bring it up 19:34:48 * nirik gets ready on the hate button. 19:35:06 <skvidal> what if we did it piece-meal? 19:35:17 <skvidal> ie: could we setup a new infrastructure that let us do a project at a time 19:35:36 <smooge> I thought that was what we were going to do 19:35:37 <nirik> yeah, I did think of that too. 19:35:46 <skvidal> ah, hmm 19:35:47 <nirik> I don't hate it, but we still need a plan... 19:35:50 <smooge> put a proxy in front and then move through that 19:35:59 <nirik> ie, how many instances, seperated/connected how, etc 19:36:22 <skvidal> nirik: would hosted be an example of service that is well suited to having 'in the cloud'? 19:36:23 <nirik> and preferably a way to make it spread out more. 19:36:49 <nirik> skvidal: I'm not sure. Possibly... it does get hit pretty hard... so it would use a lot of resources. 19:37:03 <skvidal> nirik: which is sorta the point... 19:37:20 <skvidal> nirik: did we ever get an answer on cloud-money? 19:37:35 <nirik> skvidal: no. sadly. 19:37:45 <skvidal> nirik: okay, so I didn't just miss that meeting 19:37:46 <skvidal> okay 19:37:47 <skvidal> thx 19:38:07 <nirik> but any plan could look at how to split it out, and if we have cloud, we could use cloud for some or part of it if it makes sense. 19:38:48 <skvidal> okay 19:38:53 <skvidal> so really this needs some focus 19:38:56 <nirik> yes. 19:39:03 <skvidal> I suspect that's why it's not gotten very far ;) 19:39:08 <skvidal> or rather :-\ 19:39:27 <nirik> I think we all agree it would be good to get it updated/upgraded, but we also want to try and make it less SPOF and such at the same time... 19:39:46 <nirik> anyhow, I can try and at least whip up some plan for people to be inspired to counterpropose. ;) 19:39:46 <skvidal> nod 19:40:02 <skvidal> nirik: not that we have the time right now 19:40:05 <skvidal> nor (probably) the money 19:40:18 <skvidal> but hosted migration sure seems like we'd benefit from a FAD 19:40:21 <nirik> it may be that we should be less far reaching... just plan for moving it the way it is now, and do a longer term thing later. 19:40:34 <nirik> yeah, that could be the case... 19:40:35 <skvidal> nirik: which will, likely, be kicked down the road forever :( 19:41:48 <skvidal> okay 19:41:50 <skvidal> what else? 19:41:51 <nirik> ok, moving on. 19:41:55 <nirik> #topic Open Floor 19:42:00 <nirik> anything for open floor? 19:42:27 <herlo> oh, I do 19:42:37 <nirik> herlo: fire away 19:42:39 <herlo> I had a nice little bug for fpaste-server 19:42:48 <herlo> with django-tracking, it needed to be updated 19:42:58 <herlo> and I'm going to be pushing that back some to accommodate 19:43:04 <herlo> but I needed to ask a couple questions regarding it 19:43:18 <nirik> sure, whats the questions? 19:43:19 <herlo> one is, do we plan to host it on its own vm? or will it live with other services? 19:44:03 <nirik> excellent question. :) I think this is exactly the sort of thing we should be figuring out when a new resource is in the 'dev' stage. ;) 19:44:08 <herlo> I've currently got the package deploying an fpaste.conf which I need to alter to better accommodate vhosts 19:44:51 <herlo> I think it could run with other vhosts as I don't think the load is going to be too high at first. 19:45:32 <nirik> so, there's a spectrum here: on one side is totally seperate. it's own instance with it's own db and own webserver. On the other end is in our proxy/app mix. It's hit via proxy and uses varnish/haproxy, runs on the app servers and talks to a single backend db. 19:46:04 <nirik> there's also some middle ground where it could be using proxy/caching, but have it's own instance and db 19:46:16 <herlo> which if we did the latter, would need to be similarly setup somewhere along the way, right? 19:46:25 <herlo> would that be in dev or rather in staging? 19:46:50 * herlo thinks most of this convo can go offline, but just wanted to bring up these thoughts 19:47:00 <abadger1999> Are our proxies a limited resource? 19:47:16 <herlo> especially since we're working on the SOPs for dev and staging rollout 19:47:28 <nirik> dev => no proxy or other setup, stg => setup like it would be in prod 19:47:33 <herlo> abadger1999: a good question, wish I knew 19:47:35 <nirik> abadger1999: I don't think so... 19:48:08 <herlo> nirik: k, I'll work with it that way then... 19:48:10 <herlo> thanks 19:48:18 <abadger1999> nirik: I get a good feling about the middle way, then, treat the app as a separate resource but the infrastructure around it as shared. 19:48:49 <abadger1999> so that we can upgrade the host/db for one app independently of the host for a different app. 19:48:50 <nirik> yeah, adding more to a single db is something I wish to avoid... more eggs in one basket. 19:49:17 <abadger1999> I'm not sure what makes the most sense from a sysadmin/money perspective though. 19:49:46 <nirik> I guess if we had better db replication it might be less anoying. 19:51:12 <nirik> I guess it also depends on load/how popular something becomes. 19:51:44 <nirik> if it's really popular and we need more app resources, we could also move an app from it's own instance out to the app* machines to spread that out... 19:51:49 <abadger1999> I'm not sure if separate db's gains us as much as separate app servers... I think it would distribute load and allow tweaking individual dbs for different types of queries but I'm not sure we have those issues. 19:52:10 <smooge> I would prefer middle road proxies -> app-fpasteXX -> db-fpasteXX 19:52:14 <abadger1999> it's a the db is a SPOF for an app whether it's in a shared db or separate. 19:52:22 <abadger1999> s/it's a// 19:52:35 <smooge> yeah but instead of 20 apps going down for an hour.. we may have 1. 19:52:50 <abadger1999> smooge: All depends.... 19:52:57 <smooge> right now when we reboot the db servers, most of everything Fedora is offline 19:53:02 <smooge> and if the db doesn't come back 19:53:30 <abadger1999> why did it go down? postgres update or hardware? Did it take out fas? etc etc. 19:53:40 <herlo> so I basically setup my app as normal and then we add it to the lb structure? That sound about right? 19:54:04 <nirik> herlo: yeah, that work takes place in stg step... but do be thinking about it now, I think thats good. 19:54:07 <herlo> there isn't a lot to fpaste-server. 19:54:09 <smooge> usually it goes down because we need to reboot or the hardware itself needs to reboot and then it all waits til it comes back 19:54:11 <abadger1999> smooge: <nod>... otoh, if we have five different db serves, doesn't that mean we're five times as likely to have any single piece of hardware go bad? 19:54:39 <smooge> abadger1999, no.. murphy is kind to us there. The hardware will go bad sometime. 19:54:44 <herlo> cool, this sounds like we could do some cool stuff like sticky sessions and add an additional fpaste machine if ever needed. 19:54:45 <smooge> anywhere 19:54:47 * herlo likes 19:54:47 <abadger1999> :-) 19:55:26 <nirik> I guess memory on app servers is a limited resource... 19:55:29 <nirik> and cpu there. 19:55:29 <smooge> abadger1999, you can't go over 100% failure possibility 19:55:35 <abadger1999> hehe :-) 19:56:23 <nirik> if someone would like to add these questions into https://fedoraproject.org/wiki/Request_for_resources_SOP that would be great. ;) 19:56:28 <abadger1999> nirik: Hmm... memory is the limiting factor for dbs in my experience.... 19:56:33 <smooge> I mean I figure if we could devote time and effort we could have failover and clustering of some sort in place.. but I think that has been the equivalent of bug#1 since 2005? 19:57:20 * nirik nods. 19:57:29 <nirik> so, anything more? or shall we call it a meeting? 19:57:30 <abadger1999> smooge: Well... I think the reason it's still bug #1 is that no one's laid down what we want to solve and what limitations we're willing to live with. 19:58:19 <abadger1999> ie: We could have warm backups of postgres and masterslave of mysql right now... but if we anticipate switching to the failover machines in case of db outage we're going to have to acept that we might lose data. 19:58:52 <smooge> abadger1999, I agree. 19:58:55 <abadger1999> I think we could accept that... but no one wants to actually commit to it. 19:58:58 * nirik nods again. 19:59:06 <smooge> I commit to losing data 19:59:22 <smooge> that will make a great Compass goal 19:59:27 <abadger1999> hehe :-) 19:59:43 <skvidal> abadger1999: can we quantify the kind of loss? 20:00:06 <nirik> I had this random thought yesterday... I might start tossing out disaster scenerios to the mailing list. ;) "phx2 is gone. What do we have left? how would we recover" "serverbeach down, what do we have, how do we recover" etc. 20:00:48 <abadger1999> skvidal: I'm thinking in many cases, it would be minimal -- postgres warm backups rsync the transaction logs at a period you define. So you'd lose the data from that period. 20:00:59 <skvidal> then +1 20:01:05 <abadger1999> skvidal: We'd probably set it somewhat low... maybe 5 minutes. 20:02:13 <abadger1999> mysql master-slave can get out of sysnc. if it does and master fails in that time, we lose all the data that didn't get synced. If we fix any out-of-sync errors promptly and we don't fail when out of sync, we'd only lose a few minutes of data there as well. 20:03:22 <abadger1999> nirik: Perhaps we should do that with one of the new apps? Set up a replicated db server for it. 20:03:26 <abadger1999> See how it works. 20:03:37 <nirik> yeah, thats a possibility... 20:03:46 <nirik> one in phx2 and one elsewhere? 20:03:46 <abadger1999> nirik: The one issue I'd see is... db servers take a lot of memory to be high performance. 20:04:12 <abadger1999> And you want the two boxes to have the same specs so that if you have to switch to the other box, it can take the load. 20:04:15 <abadger1999> well... 20:04:27 <nirik> yeah. true 20:04:32 <abadger1999> network *latency* can be an issue with app to db. 20:04:46 <nirik> although sometimes slower performance is still better than down. 20:05:03 <abadger1999> I think the app servers not at phx show that when they try to talk to the db there. 20:05:31 <abadger1999> so it may make sense to have the db servers not at phx if our intent is protecting data in case phx disappears. 20:05:51 <nirik> or if there's some way to do master/master. ;) 20:05:52 <abadger1999> but it may not if our intent is to have a db server to drop into place if the db server in phx goes kaput. 20:06:10 <abadger1999> master/master tends to slow everything down from what I can tell. 20:06:18 <nirik> yeah, I would imagine so... 20:06:28 <abadger1999> Since you have to wait for the slowest master to commit the data. 20:06:42 <nirik> ok, lets keep pondering on it, and discuss more out of meeting/next week. ;) 20:06:48 <abadger1999> <nod> 20:06:59 <nirik> Thanks for coming everyone! 20:07:04 <nirik> #endmeeting