20:03:14 #startmeeting infrastructure 20:03:14 Meeting started Thu Jan 27 20:03:14 2011 UTC. The chair is goozbach. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:03:14 Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:03:23 #chair CodeBlock 20:03:23 Current chairs: CodeBlock goozbach 20:03:28 #chair CodeBlock skvidal 20:03:28 Current chairs: CodeBlock goozbach skvidal 20:03:47 * nirik is around. 20:03:53 #topic who's here? 20:03:58 * CodeBlock 20:04:05 * goozbach says hi 20:04:13 * mmcgrath is here 20:04:15 hi 20:04:41 isn't robyn....can't think of her last name... supposed to do something with the meeting this week? 20:04:50 bergeron 20:04:56 yeah, her. :P 20:05:02 and she is probably dealing with fudcon stuff 20:05:12 pinged her in fedora-admin 20:05:19 ah ok 20:05:20 well then 20:05:26 * CodeBlock pulls up the tickets page, I guess 20:05:26 let's go through tickets? 20:05:50 what's up with the search for new FI Lead? 20:05:59 mdomsch: it's in process 20:06:08 nothing new to report 20:06:11 hi 20:06:17 #topic tickets 20:06:30 .ticket 2519 20:06:31 CodeBlock: #2519 (kill cvs with fire) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2519 20:06:37 skvidal: this done? 20:06:44 CodeBlock: not to my knowledge 20:06:53 (it's assigned to you, so.. :P) 20:06:57 CodeBlock: it's partially done 20:07:02 some of the pieces are migrated 20:07:06 we set a sundown, though. 20:07:14 things this sunday/monday 20:07:18 threw a hink in our plans 20:07:48 alright 20:08:00 it's still a WIP 20:08:22 and it's not assigned to me, is it? 20:08:25 I filed it 20:08:37 yah 20:08:38 oh crap, you're right 20:08:39 I filed it 20:08:47 * CodeBlock is doing quite bad at this whole ticket -> meeting thing. ;D 20:09:35 ok 20:09:40 wha's next? 20:09:46 .ticket 2574 20:09:47 .ticket 2572 20:09:49 CodeBlock: #2574 (Perform regular inactive account prunings and possibly a password reset policy.) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2574 20:09:50 doh 20:09:52 not going in order here, but ... 20:09:53 goozbach: #2572 (Consider using Eucalyptus for test VM hosting) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2572 20:09:57 I'll let CodeBlock drive 20:10:00 :) 20:10:10 goozbach: I'm doing quite bad at this, but sure why not 20:10:25 one bad driver is better than two bad drivers 20:11:24 Alright, so after the whole... security issue (which, mind you, is a figment of everyone's imagination ;D) ... does anyone have any thoughts on that ticket (kill out access from accounts which have been inactive for $x amount of time, and require passwords be changed every $timespan) 20:11:40 okay 20:11:51 I think that will be discussed in copious detail at fudcon 20:11:56 in a highbandwidth cage-match 20:12:08 +1 for fudcon 20:12:12 alrighty 20:12:29 #item ticket 2574 to be discussed at fudcon 20:12:34 ianweller: you around by chance? 20:12:42 .ticket 2543 20:12:42 #action ticket 2574 to be discussed at fudcon 20:12:43 skvidal: #2543 (upgrade internetx01 to rhel6) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2543 20:12:46 that's done 20:12:55 not sure if ricky is content with it or not - but it is done 20:13:13 CodeBlock: kind of 20:13:18 .ticket 2563 20:13:19 CodeBlock: #2563 (upgrade MediaWiki to 1.16) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2563 20:13:22 ianweller: any update on that ^ 20:13:32 no, smooge and i are gonna talk at fudcon 20:13:38 heh alright 20:13:39 nobody's been reviewing my extensions 20:13:47 i should add that in the ticket 20:13:49 that is all :) 20:13:54 alrighty 20:14:24 .ticket 2544 20:14:29 CodeBlock: #2544 (migrate autoqa01 elsewhere) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2544 20:14:40 skvidal: ? 20:14:41 that's not been done and it is waiting on somethings happening 'soon' in the colo 20:14:58 yeah, I think thats pending on a co-lo visit from smooge 20:15:05 alright 20:15:12 and good afternoon, nirik :) 20:15:18 morning. ;) 20:15:38 alrighty then 20:15:57 I think a lot of these can/should be discussed at fudcon, so 20:16:17 #topic anything else? 20:16:34 skvidal: any thoughts on an infra freeze over fudcon? 20:16:35 2546 20:16:51 CodeBlock: well, everyone who can do anything will be at fudcon 20:17:08 .ticket 2546 20:17:09 goozbach: #2546 (bnfs01) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2546 20:17:28 skvidal: Yeah, I can just imagine everyone having fun and then nagios go nuts because someone broke something though 20:17:38 * gholms is here 20:17:38 and will hopefully on things so a freeze appears to be a bad thing imho 20:17:48 VileGent: ? 20:17:56 a freeze 20:18:09 CodeBlock: the reality is you break something in a room w/all the other people involved and you get with heavy objects 20:18:12 VileGent: freezes are bad? 20:18:15 wouldnt alot of people be working on things a an freeze prevent that 20:18:33 2572 20:18:37 Gah, sorry 20:18:43 VileGent: ah - you left out a verb there 20:18:46 gholms: yes, please chime in on 20:18:51 .ticket 2572 20:18:52 goozbach: #2572 (Consider using Eucalyptus for test VM hosting) - Fedora Infrastructure - Trac - https://fedorahosted.org/fedora-infrastructure/ticket/2572 20:18:54 * nirik notes https://fedorahosted.org/fedora-infrastructure/report/10 has a report of all meeting items. ;) 20:18:55 skvidal, i am saying that a freeze over fudcon could be bad 20:19:19 VileGent: I think you're probably correct - but at the same time EVERYONE IS IN THE ROOM 20:19:30 VileGent: so i think we'll be okay, then. 20:19:32 nirik: Most of them are fudcon-discussable though 20:19:32 Do I have five to ten minutes or are you nearly out of meeting items? 20:19:39 I see the point of not having a freeze over fudcon 20:19:40 CodeBlock: yeah. 20:19:41 b/c a freeze can be overriden with +1's 20:19:51 let's call it a cold front then :) 20:19:53 * gholms is in an impromptu meeting 20:19:54 but let's not worry about a freeze today 20:20:01 not quite a freeze 20:20:05 nor a slush 20:20:16 just a "don't be stupid" reminder? 20:20:23 * nirik doesn't think we need a freeze in this case, IMHO. 20:20:42 alright 20:20:49 just didn't know what the plan was 20:20:54 nirik: I can think of one case 20:21:03 CodeBlock: also we're going to have smooge in the colo 20:21:07 messing with stuff 20:21:19 ah 20:21:22 so a freeze is practically impossible when someone is breaking^W changing hardware 20:21:55 alright 20:22:07 #action no freeze 20:22:28 or should that be 20:22:35 #agreed no freeze 20:22:39 indeed 20:22:54 Alrighty, does anyone have anything else to bring up, or are we closing early? 20:23:01 * gholms raises hand 20:23:08 CodeBlock: ticket for gholms 20:23:19 gholms: go for it 20:23:21 #topic Consider using Eucalyptus for test VM hosting 20:23:28 :) 20:23:58 So the idea here is to look at using Eucalyptus to give packagers and others a way to test stuff non-destructively. 20:24:23 The benefit of using that over the things we have is its VMs are a lot easier to create and destroy. 20:24:36 still the same problem we've always had 20:24:40 that we need the hw to host it 20:25:05 now - two things I can think of to do with that 20:25:15 but will require some effort 20:25:30 would this be where the testing boxes would go? 20:25:35 1. change the x86-## boxes from pure builders to euca backends. 20:25:48 2. have builds all go through the same mechanism - in or out of koji 20:26:34 what sort of instances are we talking about here? package maintainers/qa folks instances for testing things? 20:26:48 nirik: at the moment I think it is arbitrary instances 20:27:26 The first use case that came to mind was one where people can test stuff that involves installing packages. The sorts of things you can't do on fedorapeople because the average joe can't be given sudo on it. 20:28:33 This isn't really the greatest solution for VMs that need to last a long time, but for ephemeral stuff it ends up being a lot easier. 20:28:33 * nirik was just trying to guess what kind of demand there would be. I'm not sure there's too much for that case... not too many people use the test machine instances I have for example. 20:29:20 Is that because of lack of public knowledge, though? 20:29:29 dunno. could be. 20:29:43 or many maintainers have their own virt instances on their hardware, or ? 20:30:13 If no one wants it then no one wants it, that's fine. 20:30:13 how often to the test servers get used? 20:30:31 I can see how a "fire and forget" virt guest could be useful 20:30:43 My biggest problem with local VMs is that I have to set them up and tear them down all the time. 20:31:04 e.g. with cobbler 20:31:06 but at the same time how do we police the "this box has been sitting unused for $MONTHS" 20:31:35 With a script. The eucalyptus public cloud shuts down VMs after N hours of operation. 20:32:41 okay 20:32:47 hmm 20:32:50 so a couple of things that came up following the incident this week 20:32:59 1. our publictest boxes are a "problem"(tm) 20:33:08 2. we have a lot of idle time on a number of pieces of hw 20:33:08 * nirik wishes we had some way to poll our users and ask them what kinds of things would be helpfull... see if this is even on the radar. ;) 20:33:22 3. idle cpu is bad for everything 20:34:02 if I have the time and if this is allowed I would like to pursue how many of our services can be deployed widely and lightly on cloud-based servers all over the place 20:34:21 disk space issues are one thing 20:34:41 but we've got a lot of infrastructure that's delicately balanced in phx2 and doesn't feel like we're getting the best use out of it 20:35:05 we have an opportunity right now b/c of the migration to rhel6 20:35:14 (and tg2) to reorient a lot of services 20:35:22 it sure feels like we should take that opportunity 20:35:24 it would be nice if things could be able to move around more easily. (for both HA and disaster recovery/prevention and such) 20:35:31 I think a cloud based or "grid" based system for hosting more than just testing is an AWESOME idea 20:35:31 nirik: I concur 20:35:33 skvidal: can you provide some examples of services you would like to migrate to cloud-based servers? 20:35:58 btw, I'm here 20:35:58 phuzion: if we can make it work from a bandwidth standpoint - - builds for non-os pkgs 20:36:10 phuzion: I'd like to see proxy## and app## moved out 20:36:21 phuzion: I would REALLY love to see our db servers: 1. replicated and 2. dispersed 20:36:28 so network outages don't make our world blow up 20:37:14 phuzion: those are a couple of examples right off 20:37:23 so let's list those as ideas 20:37:36 #idea cloud based servers for proxy## and app## 20:38:00 #idea db servers replacated and dispersed 20:38:05 db server replication is definitely something we need ... 20:38:09 I like the idea of getting the db into the cloud, but we will have to VERY carefully weigh the security aspects of moving it there. 20:38:18 # idea builds for non-os pkgs on cloud 20:38:31 #idea builds for non-os pkgs on cloud 20:38:31 phuzion: it doesn't have tobe 'in the cloud' 20:38:36 it just needs to be NOT IN PHX2 20:38:40 or better yet 20:38:42 NOT ONLY in phx2 20:38:47 in many places. ;) 20:38:48 I like the second idea 20:39:00 nirik: +1 20:39:03 here's a problem 20:39:08 and something we have to delay a bit on, I think 20:39:15 #idea db NOT ONLY in phx2 20:39:17 but I suspect this will be the subject of much discussion this weekend 20:39:29 we don't have a project lead right now 20:39:39 and while the process is 'going' 20:39:46 it doesn't seem to be going fast 20:39:57 Which project is this? 20:40:09 Fedora Infrastructure 20:40:10 yeah, db replication can be a pain... 20:41:02 so while we don't want to STOP doing things 20:41:15 we need to realize that some of those things may not be in line w/what the lead wants 20:41:21 and that will impact us all, of course 20:41:49 also, as we get later in the cycle we probibly want to change less junk so as not to bother f15 happening. 20:42:03 agreed 20:42:06 I propose that if we go ahead with the idea of replicating outside of PHX2, we should have a team, or at least one person in charge, who reports to the FI lead, given that our database is pretty damn important. 20:42:48 phuzion: one thing I want to do at fudcon is to prioritize and order our goals for the next 6 months 20:43:00 perhaps some folks would like to explore ways the db could be decentralized and report back to everyone the options? 20:43:01 skvidal: excellent. 20:43:11 nirik: the dbs :) 20:43:43 right 20:43:59 skvidal: I won't be attending fudcon, but if there are irc-meetings or teleconferences, I would be more than happy to attend and provide some input. 20:44:11 I'll see what I can do 20:44:15 I'm bringing my webcam with me 20:44:25 and if there is enough bandwidht I'd love to tie someone in 20:44:41 #fedora-fudcon should be active... and irc is usually pretty active all over at fudcon too. 20:45:02 nirik: ok, I'll keep an eye there. 20:45:47 WRT Eucalyptus I like it..conditionally 20:45:50 #action discuss database distribution and decentralization at Fudcon 20:45:56 For replicating dbs we need to decide two things: 20:46:05 abadger1999: which db to standardize on? :) 20:46:09 Are we okay with losing X seconds of data? 20:46:11 abadger1999: and how to protect the replication 20:46:22 If we're not, then are we okay with losing X amount of performance? 20:46:34 abadger1999: consistently or one time? 20:47:03 abadger1999: I'd like to ask more about where we're talking about losing data and how often 20:47:17 abadger1999: and on the performance question I'd like to know more of the impact scenarios there 20:47:31 The RH Cloud solution is RHEV-M, and it is not currently deployable on Fedora Infrastructure. 20:47:33 AFAIK our choices are basically synchronous replication in which case we have the performance hit of all replicating dbs having to commit the data before we can proceed. 20:47:34 also, how much is read-only, and how much read-write? 20:47:39 abadger1999: but more to the point - I think performance being worse is more OK than having shit just not work when phx2 vanishes 20:48:05 abadger1999: and how write-heavy are our dbs is another good question 20:48:22 Or asynchronous replication with a master-slave relationshoip -- so that if the db master goes down, we switch to a slave but... X seconds/minutes of data could be lost which the master had not synced to the slace before the switchover. 20:49:40 So anyhow, that's the choice we have to make if we want to replicate the db for HA. 20:49:58 abadger1999: I'm personally fine with either - I have no problem with a big discussion about those either 20:50:09 abadger1999: but, imo, we have to make a decision and do it 20:50:13 something to tackle at fudcon? 20:50:14 20:50:21 we cannot stall waiting on the perfect solution 20:50:29 make a choice and deal with the implications afterward 20:50:46 b/c we can't progress beyond where we are otherwise 20:50:49 I don't care either -- someone just spell out the parameters and then we'll figure out what the best way to implement that is. 20:51:05 abadger1999: it's on my list for fudcon for FI-discussion 20:51:10 we'll get a decision and pursue it 20:51:20 abadger1999: just to make sure I have it all correct 20:51:26 db03 == all koji 20:51:37 db02== mirroradmin, pkdb, fas and bodhi 20:52:06 db01 == smolt, wiki, wordpress, zarafa, transifex 20:52:08 right? 20:52:14 and perhaps it might make sense to split those out more if the seperate db's have different requirements? 20:52:27 nirik: I concur 20:52:33 I think transifesx is on db02 20:52:37 * abadger1999 checking that 20:52:44 abadger1999: mysqlshow says db01 20:52:44 ie, some are very heavy read only... etc 20:52:57 abadger1999: looks like a leftover 20:53:19 nirik: I'd like to see a number of our apps be deployed AWAY from here into their own unit with their own simple, replicated db 20:53:20 seven mins till cloud sig meeting starts 20:53:43 nirik: I think we can do these - once we make a replication decision I suspect we should be able to cookie cutter the hell out of it 20:53:45 db02: autoqa? bodhi, elections, fas2, fedoracommunity(cache?) pkgdb transifex, yubikey stuff 20:53:50 skvidal: individual app and db servers for certain apps? 20:53:56 goozbach: If we're still talking about cloud-y stuff then we can go for a bit longer. 20:54:04 gholms: :) 20:54:14 phuzion: deployable new apps and at least 2 db servers for any given db we require 20:54:19 'server' is loosely defined. 20:54:40 in our budget meeting this week 20:54:50 We're also talking about replicating over the internet, correct? 20:54:54 we talked about specific budget items for cloud hosted space 20:55:01 abadger1999: maybe - depends on the app 20:55:04 k 20:55:27 maybe define tiers of database 20:55:30 some things like MM might only need a chunk every hour or whatever it updates with 20:55:31 absolutely 20:55:46 tier 4, no replication, just backup 20:55:56 tier 3, minimial replication (same datacenter?) 20:56:07 tier 2, cross site replication 20:56:14 abadger1999: here's the way I figure it 20:56:20 abadger1999: please tell me if I'm being an asshat 20:56:25 db replication ain't new 20:56:30 and we're not hitting novel problems here 20:56:52 another factor here might be: is the thing end user/customer facing/using? or maintainer facing/using? or just good to have, but not a big deal to either users or maintainers? 20:56:53 tier 1, minimal data loss replication for core apps mission crit stuff. 20:57:02 define how to create one set of each tier 20:57:08 and do so for each new app deployed 20:57:12 we have a budget item for instances at ec2 and/or rackspace - so there's no reason to not be able to throw tests at this 20:57:37 and have them be completely throw-away test spaces 20:58:01 skvidal: db replication is not new but that's also not reassuring in that the problems that people document with replication are probably not going to be solved easily (by an update to the db software). 20:58:27 abadger1999: what are our other options? 20:58:28 skvidal: For the second... I'm not sure what problems we are hitting specifically. 20:58:39 abadger1999: phx2 is a giant SPOF 20:59:01 abadger1999: and the reboot process has proven over and over again that some of our layout is delicate 20:59:12 If it's SPOF, then some sort of replication sounds like the way to go, sure. 20:59:36 Performance issues (like the smolt issue we just had) -- replication might not help. 20:59:42 It might even hurt. 20:59:48 abadger1999: yeah - some of those are application problems 21:00:14 21:00:36 Cool. So we'll jsut examine this as -- how do we make the db not a SPOF. 21:01:25 The performance issues we are having (such as smolt), are those more of writes or reads that are causing the performance issues? 21:01:44 phuzion: smolt was a locked read if I read what ricky said correctly 21:02:23 * nirik looks at the time. 21:02:28 skvidal: because if it's reads, then slaves are a _relatively_ cheap way to add additional capacity to the application. 21:02:30 yah - we're passed 21:02:48 it's to be discussed at fudcon 21:02:54 sounds good. 21:02:55 phuzion: I'm not a mysql guy so I'm not sure -- we see a locked read but there was speculation that mysql might lock the table if there's an update pending on a read. 21:03:15 phuzion: And we do know that smolt itself is more write-heavy than a lot of other apps. 21:04:18 Alright, anything else before we get out of cloudsig's way? 21:04:21 abadger1999: and its writes are not terribly critical to be instantly available to any other system 21:04:24 :) 21:04:26 21:04:27 is cloudsig champing at the bit? 21:04:27 Well, I'm more familiar with mysql than pgsql, so I could probably help out a little bit with the mysql stuff, but like nirik said, we're past time, and I don't want to drag this on too long. We can discuss this while everyone is at fudcon, or even afterwards if we wanted. 21:04:35 nod 21:04:47 skvidal: And it isn't really critical if we lose a couple minutes of data from the writes. 21:05:05 So an ideal candidate :-) 21:06:02 Let's move talk about db replication strategies to #fedora-admin 21:06:10 sounds good 21:06:12 ok 21:06:29 and for people reading the meeting log, have a good FUDCon :) 21:06:31 #endmeeting