21:00:16 #startmeeting Cloud SIG 21:00:16 Meeting started Thu Aug 19 21:00:16 2010 UTC. The chair is rbergeron. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:16 Useful Commands: #action #agreed #halp #info #idea #link #topic. 21:00:25 yes, that's right ;) 21:00:25 Okee dokee 21:00:31 * rbergeron just has to get her hair done cuz she's sooooo girly :) 21:00:38 #meetingname cloud 21:00:38 The meeting name has been set to 'cloud' 21:00:46 thank you 21:00:49 #chair gholms 21:00:49 Current chairs: gholms rbergeron 21:01:08 * rbergeron looks around for jforbes 21:01:27 * gholms forgot about this meeting until about two minutes ago :-\ 21:02:08 #chair jforbes 21:02:08 Current chairs: gholms jforbes rbergeron 21:02:19 * rbergeron waves to all - ready to start? 21:02:22 Roll call! 21:02:27 kinda 21:02:29 here 21:02:34 #topic Roll call 21:02:40 a/s/l? (don't taze me bro!) 21:02:49 * rbergeron grins 21:02:51 * rbergeron is here 21:03:26 Chirp, chirp 21:03:41 allllrighty. 21:03:50 #topic EC2 status 21:04:00 jforbes: take it away, sir :) 21:04:20 Sadly not much to update here, I need to spin new images, this afternoon/tomorrow 21:04:47 Being in boston last week didn't help 21:05:01 how was linuxcon? 21:05:25 didn't really see it, just kvm forum and team meetings :) 21:05:39 btw, i see you're coming to FUDCon - if anyone else is interested in going, now is the time to sign up: :) https://fedoraproject.org/wiki/FUDCon:Tempe_2011 21:06:33 alright - soooo last week we discussed a bit about doing some documentation and so forth for some of the EC2 stuff - so that we can have a repeatable process 21:07:03 jforbes: do you have any thoughts on what can be done there to help you out? 21:07:59 Not really because we are still doing 1 off 21:08:25 I'm happy to help with the s3-backed images, btw. 21:08:52 brianlamere: as far as docs? 21:09:09 however and whatever is helpful 21:09:20 So there was some good docs on the list for the boxgrunder builds, but we need the right tools for fedora builds to doc a repeatable process 21:09:26 that means euca2ools that works 21:09:45 so we need to finish that packaging. 21:09:50 what doesn't work in ooktools? 21:10:02 It's done, just not in yum because boto hasn't made another release yet. 21:10:10 smooge: right 21:10:17 Oh, is it available elsewhere? 21:10:18 So for now use this: http://repos.fedorapeople.org/repos/gholms/cloud/ 21:10:33 well, I have boto tools I actually use for doing things to aws, I am not quite finished cleaning out the private stuff from them, though 21:10:37 gholms: thanks :) 21:11:10 speaking of boto, I popped on a couple times and didn't catch him, haven't got an email back, and it looks like he's on vacation 21:11:14 so - are we going to be able to have euca2ools by the time f14 rolls around - or is tht unknown with the boto stuff - do we have to have it? 21:11:17 hrmmm 21:11:25 http://bazaar.launchpad.net/~eucalyptus-maintainers/euca2ools/euca2ools-main/changes 21:11:33 It looks like development on euca2ools suddenly resumed. 21:11:38 not required, but really nice so it isnt a 1 off 21:12:07 I see some commits that try to avoid stuff that isn't supported in boto 2.0, so if it is completely compatible with 1.9 again I can push it to updates-testing as-is. 21:12:20 nice 21:13:17 It's still a prerelease, but at least it isn't insufficient. I'll have a look at today's snapshot and see what I can find out. 21:13:41 So - if that works - .... ? 21:13:51 Then the update goes into the Fedora repos. 21:14:40 and we can apply that to the EC2 stuff we're working on here - and get forward with some repeatable process / documentation? 21:14:51 so the euca2ools fedora package is being maintained at launchpad? 21:15:24 brianlamere: I maintain Fedora's package since upstream bundles boto and m2crypto. 21:15:58 ah - just wondering why you don't get it from http://open.eucalyptus.com/downloads instead of launchpad :) 21:16:08 Yep, that's why. 21:16:48 * rbergeron has to grab a power cable - brb sorry 21:17:04 ok, makes sense. 21:17:37 btw, just sent an email to the list about a long list of bucket names I grabbed; whenever the account info is figured out, one/some of those names may be useful 21:18:19 because yes, they can be given (sortof) to another account; you just delete the bucket and the other person creates it. 21:19:03 Does that mean we need a shared account to publish stuff to the "official" buckets, or can we have ACLs? 21:19:18 gholms: let us know what you find out re: euca2ools? :) 21:19:38 #info upstream euca2ools development has resumed 21:19:52 #action gholms to investigate deployability of latest euca2ools snapshot 21:19:55 rbergeron: ;) 21:20:06 well something will end up being "official" because the Fedora Cloud says it's official; and yes, you can share via acls - whatever account owns it will just allow other accounts to write to that bucket 21:20:37 The "official" bucket name can just be published on the wiki or elsewhere on the site. 21:20:42 err..that was supposed to say "the Fedora Cloud website" 21:21:24 aye - so if any of the ones I grabbed and posted to the list look good, let me know and I'll give it to whomever 21:21:37 just didn't want them to disappear 21:21:55 Yeah, I have the official fedora account 21:22:46 * gholms wonders what rhel's bucket naming scheme is 21:22:54 oh, ok - didn't know there was an official account already :) tell me which you want (if any) and I'll hand them over 21:23:16 is the bucket name the same as the image name? 21:23:28 no 21:23:34 no, the image is held in the bucket 21:23:40 many images can be in the same bucket 21:23:43 #info List of possible bucket names: http://lists.fedoraproject.org/pipermail/cloud/2010-August/000269.html 21:23:49 that's what I thought. So the bucket name isn't so important, right? 21:24:03 Not incredibly important, but we want it logical 21:24:11 not extremely important, no, but it makes it easier to distinguish which amis are official 21:24:12 Not that getting a good bucket name isn't important. 21:24:19 right. thanks. 21:24:40 if the "official" ones are in "fedora-cloud" then...anything not in "fedora-cloud" isn't official ;) 21:25:40 I had an interesting conversation this morning, btw - my amazon account person wanted to have a phone call with me, so we talked about the stuff he wanted to talk about...but then... 21:25:40 needs a class on what buckets, images, and the rest is . 21:25:47 Well, Amazon is going to help us there, in marking things official 21:26:12 do we need to have our contact there help us out? 21:26:30 kindof 21:26:39 smooge: Buckets are like directories. Images of VMs are files that reside in buckets. ;) 21:26:41 He is waiting for me to tell him which ones 21:26:57 I asked him about the other stuff. is there a contact already working on getting an account for repo instances and s3 buckets for the rpm content? 21:26:59 ke4qqq: you about? 21:27:03 Do we need a separate bucket for each zone? 21:27:04 jforbes: who is the guy you're talking to? 21:27:11 * rbergeron wonders if we have a bunch of different names 21:27:38 rbergeron: for a few moments 21:27:43 Nathan, gafton, or msw. But Nathan is the one who is going to tag them 21:27:55 ke4qqq: who is the guy you've been talking to at amazon? or attempting to that doesn't write back 21:27:58 nathan thomas is who I emailed and called - no response to either. 21:28:02 ahhh 21:28:14 Nathan responds to me, I worked with him for 5 years 21:28:20 it seems that other groups have different ones for different regions, yes 21:28:45 Nathan from RH? 21:28:46 * rbergeron laughs - sorry, ke4qqq :) 21:29:01 (Didn't he leave RH some time ago?) 21:29:02 Yeah, RH then rPath 21:29:27 well as long as its not gafton 21:29:46 I owe him money 21:30:05 there's a "redhat-cloud" in us-east, but I don't see anything in other regions 21:30:10 * rbergeron wonders what bet smooge lost 21:31:53 okay - anything else here? 21:32:14 jforbes needs an action item. :) 21:32:49 * rbergeron throws gholms the bus keys 21:33:09 not from me 21:33:16 * gholms has no idea what to write 21:33:57 #action jforbes to plow ahead with being awesome, spinning new images. 21:34:01 spin new images 21:34:13 * gholms hands jforbes a plow 21:34:15 What's next? 21:34:17 :) that work? 21:34:25 so you all do have a person looking at getting a comp account for doing a few fedora repositories at amazon? 21:34:41 #topic openstack / swift status 21:35:00 what is a comp account? 21:35:05 brianlamere: ke4qqq has been trying to get in touch with nathan about trying to figure out a way to do this - he hasnt' heard back from him via phone / email. 21:35:16 No comp account from what I heard 21:35:19 Thankfully we have yum-plugin-presto. 21:36:02 smooge: maybe not a comp account - but a way to at least be able to have people test without inflicting pain on their own bank accounts, but without just doling out blanket have fun on ec2 storing whatever you want privileges to the whole planet. 21:36:35 We will have a mirror there, just not comped 21:36:37 #undo 21:36:37 Removing item from minutes: 21:37:05 oh compensation versus computation 21:37:28 ie, "free" ;) 21:38:29 Do we have a budget for mirror instances? 21:39:07 We have money available. If we have an idea of how much $ we are talking about, we can likely get that okayed, so long as it's not a bazillion dollars. 21:39:13 Umm, yeah, that was sorted from what I heard 21:39:28 Right now, what's available is basically "be prudentand don't go nuts plz." 21:39:55 It's just the cost of running the instances, plus the cost of the S3 buckets that host the RPMs. I think it's one instance per availability zone, and one mirror bucket per region. 21:39:58 I heard it was $20.00 and a sixpack of cheerwine 21:40:42 max knows where i live and i don't need him coming to take it out of my allowance :) 21:41:48 well since inbound is free, and outbound within the same region is free, acl's could be put in place that would make it really cheap...assuming we could know what blocks of IPs are used in particular regions :) 21:42:03 #action rbergeron figure out what the budget actually is. 21:42:57 do we think we could find that information out? 21:43:26 * gholms wonders if they have a sales/pricing phone number or email address 21:43:43 Yeah, mdomsch should have some info already 21:44:44 #action rbergeron ping mdomsch to see what he knows about pricing, other info. 21:44:52 http://calculator.s3.amazonaws.com/calc5.html 21:45:22 the pricing info is sortof complex looking until you see your bills 21:45:38 I mean mdomsch was already involved in mirror discussions, so he could know 21:45:49 * gholms needs to get some funding together to test this 21:47:06 bah 21:47:16 * rbergeron waves to mdomsch 21:47:24 * mdomsch has been deliquent in getting at that 21:47:25 gholms: to test... 21:47:38 originally, I thought Amazon was going to give us the instances & storage for free 21:47:45 but max didn't seem to think so; and I haven't followed back up 21:48:03 rbergeron: Which types of intra-region data transfers are free 21:48:11 all 21:48:21 anything to s3 intra-region is free 21:48:53 anything from ec2 to ec2 intra-region is not. it doesn't cost as much as inter-region, but it isn't free 21:49:27 mdomsch: can you check into it and let us know :) 21:49:47 at the risk of not getting it done, sure :-) 21:49:58 * mdomsch needs more weeks in a week 21:50:13 http://aws.amazon.com/ec2/#pricing - search the page for "regional data transfer" 21:50:21 $0.01 per GB in/out – all data transferred between instances in different Availability Zones in the same region 21:50:38 amazon _wants_ us to have our images up there - so ignore the standard pricing pages 21:50:43 mdomsch: thanks :) 21:51:37 Can we have all mirror instances in one region mount the same bucket of packages without being charged extra? 21:51:53 s/charged extra/charged for data transfer/ 21:51:59 well yeah, certainly. Though of a different way, they don't (currently) charge for inbound data, so whenever someone does a yum update right now, that's bandwidth amazon is eating. they're much better off having the repos in their own network 21:52:24 data transfer within the same region to /s3/ is free 21:52:43 s3 doesn't even have zones, just regions. 21:53:06 so if an instance was serving the content via s3, the only charge would be for the stuff you couldn't offload to s3 21:53:54 If we can do that then that will save an awful lot of money on storage. 21:54:07 my plan was to have one mirror per region 21:54:35 The problem with that is then everyone gets hit with "regional transfer" charges for the ec2-to-ec2 traffic. 21:54:57 like I said, the pricing looks really complicated until you start getting bills ;) yeah, having (at least one) instance per region that serves the content from the s3 bucket for that region makes the most sense. then your bill will barely be larger than the cost of the instances 21:55:32 well the traffic shouldn't be ec2-to-ec2, the instances should just initiate the traffic and then hand off to the s3 buckets 21:56:01 one mirror per region; one bucket per region for the mirror content; and the netblocks for each region so each gets served from their own region's mirror 21:56:03 Ahh, the mirrors point their traffic directly to the S3 buckets? 21:56:28 s3 isn't something you "mount," it is a url that you do http requests against. not only can you do http requests against them, you can *only* do http requests against them 21:56:39 ok, so not s3 then 21:56:57 a medium instance, with an attached 1TB storage I can put a file system on 21:58:10 Can multiple instances mount the same one? 21:58:26 no 21:58:34 why wouldn't s3 work? you'll know the url, it will be the bucket name plus "s3.amazonaws.com" - example, with a bucket in us-west-1 called fedora-us-west-1, the url would be fedora-us-west-1.s3.amazonaws.com 21:58:54 EBS can only be mounted once 21:59:50 Oh, right! Can't you set S3 to be web-browseable? We can just have mirrormanager point to that. 22:00:05 I was presuming I would use rsync to download content from the master mirrors, and drop them into amazon storage (somehow, somewhere), that looks like a local mounted file system 22:00:15 Then we would just need one instance per region to sync it from the master mirrors. 22:00:19 if you want an example of a page with it working, just go to www.dcshoes.com and look at the page source. We don't have any of the content on the instances - we only have the python/django code on the instances. all the content is on s3 22:00:42 yes - s3 is completely browseable via the web. you can set acls of various complexities, or leave it wide open 22:01:17 There's a fuse module for S3. 22:01:19 how would a simple web server offer up content found there? One that expects a file system hierarchy? 22:02:45 mdomsch: You can set S3 buckets to be browseable directly via HTTP; you don't need a web server to do it. 22:02:56 ah ha 22:02:59 it's the cloud, stop thinking like the 90s ;) s3 isn't mounted, it's a web server 22:03:09 it's just a dumb web server 22:03:21 * jsmith is late to the meeting 22:03:33 The question is just how one would sync that bucket with the master mirror. 22:03:35 yeah, but I need to at least copy content into it, with one side being rsync 22:03:38 it can only do http get/post/copy/etc 22:04:02 maybe fuse is the right way then 22:04:06 Can rsync do it with --no-append (or whatever) and the S3 FUSE module? 22:04:16 pipe it into boto to copy it in, or use any of the fuse tools for mounting s3 as a filesystem 22:04:34 Ooh, how can boto help us here? 22:04:35 problem is most of those tools work very poorly 22:04:46 the fuse tools, that is 22:05:07 they'll dump out just a few hundred megs in, sometimes even sooner 22:05:39 it's very very easy to copy a file to s3 with boto. I could even take stdin to a python script, and put it to s3. 22:05:52 jsmith: we're past the endtime, but it's all good :) 22:06:11 rbergeron: I know, I know... it's been a crazy day 22:06:22 jsmith: no worries :) 22:06:22 brianlamere: Would it be easy to script that to essentially ``cp -R'' stuff? 22:06:28 point is, as someone who has large, busy sites on amazon with almost no data transfer costs on the instances themselves, it's not hard ;) 22:07:45 sortof - yeah, it's just a few lines (less than 10) to have a copy utility to s3. and remember, your ephemeral instance will have ephemeral storage that you could sync to, and then sync from that to s3 22:07:49 Does the master mirror only have rsync access? 22:07:56 the m1.small comes with 160GB, and it goes way up from there 22:08:32 so, rsync to your ephemeral storage, and then s3copy from the ephemeral storage to s3 22:09:05 that would be using the standard tools plus literally just a few lines of python :) 22:09:38 we're approaching 1TB of data to sync in 22:09:43 We would either have to clean out the ephemeral storage after syncing every release-arch combination, which would generate an awful lot of traffic on the master mirror. 22:09:56 what size instances were you thinking of using? 22:09:57 and to keep synced 22:10:09 The alternative would be to use a large EBS volume and just sync that with S3. 22:10:13 I was thinking small to medium, with an extra EBS of 1TB 22:10:56 One of those in each region, then? 22:11:09 medium only comes with 350GB storage. But, you could get a 1TB EBS, copy to that, and then copy to s3. What you don't want to do is serve from the EBS to everyone, because then both you and they will have the ec2-to-ec2 transfer costs 22:12:51 the calc from amazon I pasted above is helpful, though; good for getting ballpark ideas 22:12:59 * rbergeron wonders if we are coming to conclusions here and if we are if we should be noting these conclusions? :) 22:13:10 rbergeron: Working on one here... 22:13:10 * rbergeron isn't trying to push - just want to make sure we don't lose anything important 22:13:15 gholms: you rock 22:13:59 Sorry to run early, but my kids have meet the teacher night, I gotta go 22:14:46 remember the RRS option, too - with easily replicated data, getting it 1/3 cheaper per GB storage is $$ 22:15:11 jforbes: see ya :) have a good one 22:16:00 Sorry, still typing... 22:16:25 * rbergeron grins 22:17:07 Proposal: In each region, run one small instance attached to a high-capacity EBS volume containing a mirror. After rsyncing its EBS copy of the mirror with the master mirror, each instance will copy it to S3 buckets that instances use as mirrors. 22:17:20 (The wording can be improved later, of course.) 22:17:55 brianlamere, mdomsch: does thatsound reasonable? 22:18:52 sounds good 22:19:07 sorry, stepped away 22:19:44 what is RRS ? 22:19:49 though note that the ebs will cost some $$ at that size, so if you'd like I can give you more than just a few lines of code and things could be copied directly to s3 without the middle step. depends on what mdomsch likes 22:19:53 Reduced redundancy storage 22:20:12 RRS is "reduced redundancy storage" - s3 at 0.10/gb versus 0.15 22:20:14 hmm 22:20:40 brianlamere: The problem with not holding onto an EBS copy is that we have to transfer the entire mirror every time we sync it. 22:20:46 if the ultimate goal is to serve the content out of S3's http server 22:20:51 brianlamere: Unless, of course, we can rsync the S3 copy directly. 22:20:52 it only has several copies instead of lots and lots of them. they guarrantee 99.99%, and you can get reports of any files that disappear and replace them from the other mirrors 22:21:00 then it makes sense to try to skip the copy into EBS and then to S3 22:21:16 yeah - RRS would be fine then 22:21:33 we've got lots and lots of copies available already to serve from 22:22:21 http://aws.amazon.com/sns/ - "simple notification service" - 100,000 notices free per month. So, unless 100,001 items disappear, free ;) 22:22:51 and 0.06 per 100,000 after that so...err..still free, really 22:23:08 it's one of the services that end up being a tiny dot on the bill 22:23:10 huh 22:23:15 mdomsch: Do we have HTTP or FTP access to the master mirror? 22:23:20 gholms: yes 22:24:22 mdomsch: My concern is that not holding onto an intermediate copy will force us to transfer a lot more from the master mirror. 22:24:58 Or can we sync only changes from an HTTP or rsync mirror with S3 directly? 22:27:18 sortof - you'll have to have something manage it, otherwise it will happily let you copy a file there that already exists 22:27:53 but that's a problem that's already solved a million times over. can off-list with mdomsch on it maybe? 22:28:05 It is? That's good to hear. 22:28:20 please 22:28:20 or on the cloud list 22:28:31 :) 22:28:37 brianlamere: as you can tell, I've not worked with amazon much, at definitely not S3 22:28:39 sounds like you have 22:28:41 yeah, it's not like we resync our s3 files every time we go from dev to live on our customer sites ;) 22:29:14 what I need is to have a directory hierarchy of about 1TB of data in lots and lots of files 22:29:21 that's served via http 22:29:22 we only sync those up that are needed. I can't recall, but isn't part of the filelist at a repo a list of the checksums anyway? so they've already computed the list for us 22:29:43 and that we can somehow compare against the master server and update accordingly (traditionally rsync) 22:30:00 we've got more than just yum repos 22:30:08 though arguably we could not host the ISOs there 22:30:52 yeah, filelists.xml.gz - isn't that a list of the files, and the checksums? so we'd immediately know the names of new files, just by comparing 22:30:55 There are still deltas and whatnot as well. 22:31:57 so s3fs isn't any good? 22:32:11 brianlamere: What do you use to sync S3 with local directory trees? I assume you don't transfer the whole thing. 22:32:43 no, none of the s3 fuse tools work well; they barf really fast. copy between 20-100M in, and they die. they can't handle more than just small transfers 22:33:18 we save the files directly to s3 - content is loaded directly there, and we keep a database of the files there. 22:34:54 How do you recommend we sync S3 with the master mirror, given that the master mirror does not compute checksums for all files? 22:35:07 (Sorry to put you on the spot; you're just experienced with this sort of stuff) 22:35:07 and when we're merging dev and live sites, we only move the files that are different, per the database of the files. think of it this way, a normal filesystem will have a table of the files and directories, but it will be transparent to you (btree, etc). For s3 it's just something more tangible 22:35:43 well, I'd have to refamiliarize myself with repos; I used to have several spacewalk servers back in the day, but it's been a while 22:36:08 Can we mark that down as an action item, then? 22:36:20 but I'm pretty sure that if filelists.xml.gz gets updated, you can compare to the previous version of that file and see what files changed. then you have your list of files to copy 22:36:42 That doesn't include deltarpms or the metadata files themselves. 22:38:16 well, I'll have to look at how repos work again, I'm certain there's a way to use s3/rrs to make it much cheaper. 22:38:40 I'll just have to look at it, can't tell you offhand - sorry :/ 22:38:52 #action brianlamere to look into syncing S3 with master yum mirrors 22:39:12 Anything else S3-related? 22:40:34 gotta go 22:41:11 #topic Openstack / swift 22:41:15 Anything here? 22:43:28 Ohhkay 22:43:33 #topic Open floor 22:44:18 [you hear the sound of crickets in the distance] 22:44:59 awww...reminds me of home. we don't get too many crickets here in San Diego ;) 22:45:22 I was worried I was the only one still here. 22:45:40 If we don't have anything we can probably relinquish the channel. 22:45:47 Closing in 30 seconds... 22:46:14 Thanks for coming, everyone! 22:46:17 #endmeeting