16:59:44 <rbergeron> #startmeeting Big Data SIG
16:59:44 <zodbot> Meeting started Thu Mar  7 16:59:44 2013 UTC.  The chair is rbergeron. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:59:44 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
16:59:50 <rbergeron> #meetingname Big Data SIG
16:59:50 <zodbot> The meeting name has been set to 'big_data_sig'
17:00:05 <rbergeron> #topic Who's around for fun?
17:00:18 * tflink is preparing himself for the fun
17:01:00 <rbergeron> awesome.
17:01:15 <rbergeron> #info present: rbergero, tflink
17:01:24 <rbergeron> threebean: are you here for the party as well :)
17:01:30 * rbergeron guesses witlessb is
17:01:30 <witlessb> howdy
17:01:34 <rbergeron> heya.
17:01:45 <rbergeron> #info present: witlessb
17:01:47 <threebean> rbergeron: yup :)
17:01:47 * witlessb has a party-hat on
17:01:47 * zoglesby is here
17:01:59 * rbergeron will hold another moment ... while she pulls up the magical agenda
17:02:00 * samkottler is here
17:02:03 * jsmith lurks
17:02:18 <rbergeron> #info present: threebean, zoglesby, samkottler, jsmith
17:02:43 <rbergeron> Would any of you lovely folks like to have a chair as well :)
17:03:02 <rbergeron> #chair tflink witlessb threebean zodbot samkottler jsmith
17:03:02 <zodbot> Current chairs: jsmith rbergeron samkottler tflink threebean witlessb zodbot
17:03:08 <rbergeron> Yes. thta's a yes.
17:03:30 <rbergeron> okay. sooooo:
17:03:49 <rbergeron> #topic Agenda for today's first meeting :D
17:04:00 * ctyler joins in from Hong Kong
17:04:02 <rbergeron> I posted a few things to the mailing list
17:04:07 <rbergeron> #chair ctyler
17:04:07 <zodbot> Current chairs: ctyler jsmith rbergeron samkottler tflink threebean witlessb zodbot
17:04:18 <rbergeron> hey, live from linaro-land, it's ctyler
17:04:32 <rbergeron> #link http://lists.fedoraproject.org/pipermail/bigdata/2013-March/000003.html
17:04:32 <ctyler> yep
17:05:09 <rbergeron> #info Agenda looks like: What this is all about, what do we have, what don't we have, what is anyone here interested in doing :)
17:05:18 <rbergeron> (not necessarily having to be in that order, but...)
17:05:31 <rbergeron> Feel free to poke along the way or yell or whatever if you want to add another topic and we'll figure it out.
17:05:34 <rbergeron> First meetings are fun.
17:05:57 <rbergeron> #topic What's the Big Data SIG all about?
17:06:35 <rbergeron> So I don't have an amazing answer here. Other than: Heyyy, we should do something. Because not having anything is probably not the best answer.
17:07:13 <rbergeron> It's sort of a broad field, so I figure we'll have to sort out what it is we want the group to be about, whehter it's practical implementation of stuff, packaging of things, or $somethingelse.
17:07:20 <rbergeron> Don't y'all talk at once now :)
17:07:25 <rbergeron> Thoughts? Additions?
17:07:29 <ctyler> I like the Big Data definition from that O'Reilly report a couple years back, which to paraphrase was: If the size of your data is part of the problem, it's Big Data.
17:07:37 <rbergeron> Anyone here just curious about wtf big data is?
17:07:39 <rbergeron> Ahhh.
17:07:57 <rbergeron> #info loosely quoting from o'reilly: "If the size of your data is part of the problem, it's Big Data."
17:08:56 <rbergeron> I think people are struggling with the need to save a variety of things, knowing how to store it, knowing how to do things with it, whether it's analyze, or find things quickly, or hook it up to some amazing infrastructure set-up.
17:09:50 <rbergeron> So i'm going to presume that we probably have a mix of people who want to use it or play with it here, along with perhaps some people who want to help fix that, perhaps it's a good blend of both.
17:09:53 <tflink> yeah, one part of it is getting a decent setup
17:10:04 <tflink> the other part is understanding the tools and approaches needed
17:10:24 <rbergeron> Which is helpful, since it's hard to get people to magically do things if they're not actually interested in using them :)
17:10:54 <rbergeron> tflink: agreed - and there is a lot - and i suspect people get a lot of pointy-haired-boss action saying "WE NEED ALL THE BIG DATA THINGS"
17:11:14 <ctyler> The two pieces you hear about the most in Big Data seem to be massive storage, parallel computing (Hadoop, column databases, etc).
17:11:20 <rbergeron> which is .. hopefully slightly different than me saying we should do the big data things :)
17:11:42 <rbergeron> #idea one part of it is getting a decent setup; other part is understanding tools and approaches needed
17:11:55 <tflink> yeah, I've heard plenty of people talk about "big data" or "hadoop" like some people talk about "the cloud" - something vague and cool so we should be using it
17:12:16 <rbergeron> #idea two pieces you hear most about in Big Data seem to be massive storage, & parallel computing (hadoop, column databases, etc)
17:12:33 <ctyler> There's a pretty distinct set of problems suited to hadoop and friends, though.
17:12:54 <threebean> another component is online processing or online analysis.  i.e. predicting what is trending before its had time to hit disk.
17:12:55 <rbergeron> yeah, and I think some of those folks aren't even to the point where they are thinking "maybe i should be saving this possibly useful in the future information"
17:13:29 <rbergeron> #idea another component is online processing or online analysis - predicting what is trending before its had time to hit disk
17:13:34 <rbergeron> threebean: want to expound on that a bit?
17:14:46 <ctyler> e.g., they say google can predict flu outbreaks faster than public health agencies by watching search terms
17:14:56 * threebean nods
17:15:07 <threebean> financial tools, too.
17:15:11 <rbergeron> ah - that's a good example
17:15:49 <rbergeron> #idea for ex. - idea that google can predict flu outbreaks faster than public health agencies by watcihng search terms; financial tools as well apply to concept
17:15:57 <rbergeron> thanks, that makes it much clearer.
17:16:30 <tflink> another example of stream processing is twitter analytics - watching for emerging topics in twitter streams
17:16:36 <rbergeron> okay, shall we move onwards? I think we have the discussion of "what are the buckets of things" that kind of bridges this and the "what do we actually ahve right now... if anything" discussion
17:16:52 <rbergeron> #idea another ex. of stream processing is twitter analytics - looking for emerging topics in twitter streams
17:17:33 <witlessb> re: google predicting flu: http://www.nature.com/news/when-google-got-flu-wrong-1.12413
17:17:59 <rbergeron> witlessb: cool, thanks for the link - background fun always helps
17:18:21 <rbergeron> #topic What are the buckets or categories, and what do we have?
17:19:03 <rbergeron> sooooo: I think we can probably stick nosql in a category without subdividing it - if we think it belongs here at all. :)
17:20:02 <tflink> orchestration, batch processing and stream processing are the other ones that come to mind
17:20:35 <tflink> examples: orchestration (zookeeper), batch (hadoop, disco), stream (storm)
17:20:36 <ctyler> A full Hadoop stack seems to be considered one of the useful foundation layers for some types of work, but the hadoop filesystem is getting a lot of attention as a weak spot, with nosql dbs and gluster being used as alternatives
17:20:37 <rbergeron> tflink: that seems reasonable -
17:20:43 <rbergeron> oh, you're totally ahead of me. :)
17:21:23 <rbergeron> #idea orchestration, batch processing, stream processing are categories that come to mind - orch (zookeeper), batch (hadoop, disco), stream (storm)
17:21:47 <rbergeron> welcome newcomers, feel free to pipe in if you'd like - we're just talking about different categories of tools
17:21:48 <tflink> rbergeron: storage is another one, I just left that out because you had already mentioned it
17:22:02 <ctyler> I think there are some open source column-parallel SQL databases too, haven't checked into their status
17:22:10 <tflink> I think that ctyler got the examples (HDFS, gluster, nosql)
17:22:25 <rbergeron> #idea storage is another category
17:23:00 <rbergeron> #idea full hadoop stack seems to be thought of as useful foundation layer for some types of work, but HDFS is getting attention as weak spot, with nosql dbs and gluster being used as alternatives
17:23:33 <tflink> I thought that they all had their strengths and weaknesses, though
17:23:37 <rbergeron> so does "storage" seem like a reasonable label to apply to hadoop, hdfs, nosql, gluster as a bucket?
17:23:46 <tflink> but I've spent more time reading about possible filesystems for a disco cluster than hadoop
17:24:12 <rbergeron> tflink: yeah, i think - at lesat with gluster - some of it is just things like what the data is - small bits or larger chunks when written
17:24:18 <ctyler> In addition to plain storage, one thing we've been struggling with here is how to adequately back up terabyte-to-petabyte data sets.
17:24:35 <tflink> rbergeron: hadoop doesn't seem like it belongs in the storage bucket to me - hdfs would, though
17:24:46 <rbergeron> and i suspect that everything is probably really good at one thing, and not so great at others - no superstar jack of all trades
17:25:02 <ctyler> tflink: hdfs is arguably part of the hadoop family, no?
17:25:02 <rbergeron> tflink: so hadoop would bucketize where? :)
17:25:03 <rbergeron> oh, batch
17:25:08 <rbergeron> never mind
17:25:33 <rbergeron> ctyler: I think that at some point LOTS of stuff falls into "the hadoop family"
17:25:44 <tflink> ctyler: true, you can't have hdfs without hadoop but I was thinking that hadoop would belong more with the "batch processing" bucket
17:26:13 <rbergeron> I guess the easy thing here is that we don't have a lot of these things. Most of them, actually.
17:26:26 <rbergeron> identifying doesn't take long. :)
17:26:34 <ctyler> rbergeron: you have a weird definition of 'easy'!
17:26:54 <ctyler> ah, identifying :-)
17:27:02 <bmahe> hdfs is part of the hadoop project
17:27:12 <bmahe> and one could install hdfs without using the mapreduce part
17:27:48 <rbergeron> ctyler: yeah, identifying
17:28:05 <rbergeron> #info hdfs is part of the hadoop project; one can install hdfs without using the mapreduce part
17:28:10 <tflink> bmahe: are there many projects/people who do that? I thought that hdfs was mostly used to support MR jobs in hadoop
17:28:12 <rbergeron> bmahe: thanks for piping in :)
17:28:39 <bmahe> tflink, true, but you can do it nonetheless :)
17:29:42 <rbergeron> So what we do have is: Riak - mongo - .........
17:29:57 <rbergeron> Gluster,  ithink we said we could add into here as well.
17:30:09 <bmahe> rbergeron, are you listing datastore?
17:30:12 <tflink> cassandra, hbase (if we're separating out the hadoop components)
17:30:19 <rbergeron> I'm not sure what state gluster is in right now as far as being up to date (or any of the others for that matter)
17:30:52 <rbergeron> bmahe: I am listing out nosql things off the top of my head, and gluster for good measure since we mentioned it earlier. Just thinking of *what do we have that's loosely related* in general.
17:31:15 <rbergeron> though i'm happy to pass the wheel to smarter folks who might be more organized about the what we have discussion :)
17:31:16 <tflink> oh, what do we have? I'm not sure if either cassandra or hbase is packaged
17:31:27 <rbergeron> tflink: last i looked cassandra was not
17:31:59 <bmahe> rbergeron, I just joined the room, so I am just trying to catch up
17:32:01 <rbergeron> not sure on hbase but preliminary zodbot asking says no
17:32:23 <bmahe> tflink, you may want to look at Apache Bigtop, which package already a bunch of the Apache Hadoop ecosystem
17:33:15 <tflink> bmahe: I'll take a look, thanks
17:33:15 <bmahe> actually, I am a commiter on Apache Bigtop and am interesting in this sig so I can help out both at the same time
17:33:24 <rbergeron> bmahe: so the quick catchup is that we basically talked about "what is this big data thing" - and then came up with a few buckets as ideas to just categorize things - storage, orchestration, batch processing, stream processing
17:33:40 <rbergeron> not necessarily perfect or final but just ... brainstorming.
17:33:47 <bmahe> fair enough
17:33:49 <bmahe> thanks a lot!
17:33:50 <rbergeron> bmahe: ah, so bigtop is like, the superpackage of hadoop-y things, is it not?
17:33:54 * ctyler dropped off due to dead battery, back now
17:33:55 <rbergeron> sorry, apache bigtop :)
17:34:01 * rbergeron slaps herself a bit
17:34:14 <bmahe> rbergeron, yeah, plus integration test and deployment recipes (we even have a VM recipe for boxgrinder )
17:35:00 <bmahe> these are actually the upstream of CDH (the hadoop distribution from Cloudera)
17:35:08 <rbergeron> bmahe: ah - iirc boxgrinder is sadly going the way of the dodo - but we can discuss that  ... in a bit?
17:35:19 <bmahe> sure
17:35:50 <bmahe> we also have kickstart recipes to create live usb (based on the fedora ones)
17:36:02 <rbergeron> but - I guess perhaps it would be good to hear what your thoughts are, indeed - there are a lot of moving parts there.
17:36:03 <bmahe> but sorry to distract the discussion
17:36:18 <rbergeron> bmahe: wow, awesome - no, by all means. we loooove hearing that people use fedora -
17:36:48 <bmahe> a lot of the bigtop work is done on fedora :)
17:37:08 <ctyler> \o/
17:37:08 <rbergeron> i think another useful thing the SIG can function as is sort of a - is Fedora continuing to be useful for you in however you use it with your big data aspirations
17:37:44 <rbergeron> ie: are we doing good things to make it better once up and running - or are there things getting in your way, that kind of thing.
17:38:02 <rbergeron> bmahe: but i am tickled pink to hear all of this. :D
17:38:03 <bmahe> wouldn't fedora be more useful for end users/developpers rather than deploying a full size cluster (people would rather use centos/rh/debian in thise case)?
17:38:28 <ctyler> bmahe: because of lts or .. ?
17:38:33 <bmahe> yes
17:38:40 <rbergeron> bmahe: well, i think it's useful for people who want to play with things, just check it out, etc.
17:38:42 <bmahe> stability, lts and so forth
17:39:15 <bmahe> rbergeron, agreed, but then the focus is slightly different
17:39:17 <witlessb> even if fedora is used as a base for jeos (assuming cloud installation)?
17:39:20 <ctyler> Depends on the nature of your BigData. Some big data projects are short-lived (e.g., some research projects fit within a Fedora support lifespan)
17:39:28 <rbergeron> bmahe: I think for some people it may be a "want better underlying technology faster" - i'm not sure how much things like KVM, systemd, etc. matter in tihs particular space, but I know that sometimes it's useful.
17:40:05 <rbergeron> or perhaps - can fedora access it well even if it's running on something more -EL-ish
17:40:16 <bmahe> from what I have seen, production deployments would not consider fedora. But dev clusters and others would be fine.
17:40:27 <rbergeron> though i will add - people are always happy to see things going into Fedora and EPEL
17:40:43 <bmahe> true, it does not hurt
17:41:13 <bmahe> but shouldn't we finish listing the datastores before going further?
17:41:16 <rbergeron> i tihnk from a project-level perspective - maybe more narrow than bigtop, but - it's always useful to sort out "does it work? aaaaagh" issues in Fedora, before it's an actual problem on -EL later on.
17:41:19 <ctyler> Yeah, there is that persistent rumour that Fedora helps inform the way that at least one significant enterprise linux shapes up. Can't remember the name offhand...
17:41:31 <rbergeron> bmahe: sure, i just got sidetracked by the shiny, this happens :)
17:42:10 <rbergeron> So, I think we were mostly saying "what do we even remotely have" - and where do those things get bucketized - just so we have a list of ... something that we can point people at, and perhaps come out of it with ....
17:42:16 <rbergeron> what do we need/want to do next :)
17:42:40 <rbergeron> anyone have any additional knowledge of hidden gems in Fedora? besides the handful I listed (if they apply)?
17:43:08 <bmahe> we have a lot of java libraries and servers like tomcat (used by solr, oozie)
17:43:27 <rbergeron> bmahe: good to know
17:43:29 <tflink> most of the big-data-ish stuff that I've been using isn't packaged
17:43:43 <bmahe> all the Apache HAdoop related projects have a lot of dependencies
17:44:01 <rbergeron> #info lots of java libraries, servers like tomcat (used by solr, oozie) are already in
17:44:09 <rbergeron> bmahe: yeah, and mostly java, correct?
17:44:14 <bmahe> also
17:44:36 <bmahe> we also have pandas. This is not really big data, but still very useful for data analysis
17:45:15 <tflink> yeah, pandas is really useful
17:45:42 <bmahe> another issue fedora could help also is, reporting bugs with the openjdk. Most of these projects go straight to the oracle jdk and do not really test against openjdk. They are not against openjdk and would welcome help though
17:46:53 * rbergeron assumes this is not ... this type of panda: http://fedoraproject.org/static/images/panda-wave.png
17:47:14 <ctyler> May I propose we start a wiki page, populate it with a list of relevant packages, and note (a) what shape each is in in Fedora, then (b) vote on what we care about, to use that as the basis for some planning (i.e., for next meeting)?
17:47:26 <bmahe> rbergeron, http://pandas.pydata.org/
17:48:34 <rbergeron> ctyler: of course :) - I started a big data sig wiki page - do we just want to stick it right there on that page?
17:48:57 <rbergeron> I feel like i'm definitely in foreign waters a bit (much like when i sort of stumbled into the cloud sig)
17:49:00 <rbergeron> :D
17:49:07 <rbergeron> (but glad to see that everyone else knows what's up, yay)
17:49:17 <ctyler> Or maybe a subpage because we won't want this on the SIG front page in six months?
17:49:40 <rbergeron> #idea we have pandas (not the animal, http://pandas.pydata.org) - useful for data analysis, not really big data
17:49:57 <rbergeron> ctyler: you know the saying, it's a wiki, be bold? :)
17:50:18 <rbergeron> #action rbergeron to add a sub-page of packges we have (unless someone beats me to it)
17:51:06 * ctyler thought that was: When on the road, let someone with a decent internet connection edit the wiki :-)
17:51:30 <rbergeron> bmahe: i think your thought on the openjdk stuff might make for an interesting mail to the mailing list, fi you wanted to do that.
17:51:51 <ctyler> Fits nicely with today's RH announcement about OpenJDK6
17:51:53 * rbergeron jus tnotes we're coming up on the hour ... rapidly
17:52:01 <bmahe> rbergeron, sure
17:52:11 <bmahe> ctyler, which announcement?
17:52:14 <rbergeron> ctyler: yeah, i saw daddy shadowman said something, but i haven't actually opened that up yet.
17:52:23 <rbergeron> clicked on that twitter link.
17:52:56 <rbergeron> bmahe: so wrt bigtop - have you guys had aspirations for actually getting it packaged proper in a distro?
17:52:59 <bmahe> rbergeron, I will send an email tonight when I come back from work
17:53:06 <bmahe> rbergeron, we do
17:53:20 <rbergeron> it being "all the things" :D
17:53:21 <bmahe> rbergeron, actually, the ubuntu cloud is basing their packages on bigtop
17:53:35 <bmahe> it -> the email regarding openjdk
17:53:53 <rbergeron> bmahe: gotcha
17:53:59 <ctyler> bmahe: http://www.redhat.com/about/news/press-archive/2013/3/red-hat-reinforces-java-commitment
17:54:35 <bmahe> rbergeron, so right now, in bigtop we are packaging a bunch of projects (jsvc tomcat  bigtop-utils  crunch  datafu  flume  giraph  hadoop  hbase  hive  hue  mahout  oozie  pig  solr  sqoop  whirr  zookeeper) for sles/fedora/ubuntu/debian/centos [,]
17:54:41 <bmahe> centos [5,6]*
17:55:03 <rbergeron> bmahe: nod
17:55:09 <bmahe> so that's a lot to target and we had to take some shortcuts such as pulling the dependencies through maven and not packaging them independently
17:55:24 <bmahe> but it would be awesome if we could straighten up this part
17:56:08 <bmahe> and ideally, my goal (and some of the other people there) would be to become an upstream of distributions. Because there is no reason to duplicate the same efforts and we should all share a common base if possible so we can focus on higher level tasks
17:57:51 <rbergeron> bmahe: nod - esp. with java - there's a lot of effort in packaging that much stuff - and maintaining
17:57:51 <bmahe> ctyler, thanks
17:57:54 <ctyler> So my angle on this, in addition to the fact that my college is looking at big data in the curriculum, is that ARM hyperscale looks like it will eventually be a good way to do some of this, yet the story is weak on a few fronts (e.g., OpenJDK on ARM).
17:58:22 <bmahe> ctyler, ubuntu was working on hadoop on arm. not sure where they are at
17:58:50 <ctyler> The OpenJDK piece is being worked on, it would be good to ensure that the rest of the pieces are in good shape on Fedora ARM.
17:59:28 <rbergeron> ctyler: when you say "do some of this" - you mean "some of the big data things" or something more specific
18:00:05 <ctyler> rbergeron: I mean that ARM hyperscale is well suited to some big data tasks (but not others, yet).
18:00:15 <rbergeron> bmahe: i (sadly) don't have major advice/thoughts on the whole maven/dependencies/becoming an upstream of distributions - my specialty is ... well, typing fast and cheerleading and not ... packaging
18:00:21 <rbergeron> ctyler: ahhh, yes, totally
18:00:26 <bmahe> rbergeron, maybe we could try to dogfood the use cases by applying some tools to the fedora projects? Do we have access to any data from fedora? (webservers, package builds...)
18:00:45 <rbergeron> bmahe: threebean is your guy.
18:00:48 <ctyler> (i.e., 10K cores in one rack is great, but 4GB per process max is a ceiling for some things)
18:01:04 <bmahe> rbergeron, no worry. I am pretty sure something interesting will come out either way
18:01:29 * threebean waves
18:01:39 <bmahe> threebean, hi!
18:01:43 <tflink> bug data could be a decent candidate
18:01:47 <rbergeron> bmahe: he's working on http://www.fedmsg.com/en/latest/ - mayve that seems like a good intersection
18:02:30 <threebean> bmahe: hi.. we've been looking at throwing our infrastructure logs at logstash, but not much else as far as analysis goes yet.
18:02:32 <tflink> but bug data isn't all that big on the scale of some projects - IIRC, F15 bugs are ~ 3-4G of text
18:02:39 * threebean nods
18:02:59 <threebean> yeah, the fedmsg data isn't that big either.  It's big, but not Big.
18:03:05 <bmahe> tflink, that would probably feed some ideas
18:03:14 <rbergeron> bmahe: so I may be asking you to, well, write multiple emails - not an emergency or anything, we're not on fire here - but I think you've got a few cool things to discuss, esp. the maven/dependencies thing -
18:03:33 <tflink> bmahe: I have code for grabbing data from bugzilla and all the F15 bugs if you're ever interested
18:03:38 <bmahe> unless fedora is willing to support this sig with a cluster and terabytes of HDD, I am not sure we want to go *that* big either
18:03:44 <tflink> in both XML and extracted txt form
18:03:46 <rbergeron> i know that the cloudstack folks have some of the same stuff going on, and even the folks who did jboss as had ... welll... fun.
18:04:15 <bmahe> rbergeron, let me write down the email subjects: openjdk support and packaging of dependencies?
18:05:25 <rbergeron> yeah, i think that's it.
18:05:51 <rbergeron> mind if i action you on that, so we don't say next week... who was that awesome person, what were we hoping to learn? :)
18:06:00 <tflink> I'm not sure if anyone else is, but I'm interested in bigdata stuff outside the hadoop ecosystem
18:06:15 <rbergeron> bmahe: I think just in general with packaging - anything regarding your thoughts/aspirations/potential problems
18:06:18 <bmahe> rbergeron, note also the tendency to use the latest and greatest only (servers in clojure, having strict dependencies on a beta version of yesterday build of a dependency...). So a lot of fun ahead
18:06:18 * ctyler notes that the pillow over there is pointing to the clock, which reads 2 am. I'm out, will read the minutes to see how the story ends.
18:06:23 <rbergeron> tflink: I TOTALLY AM>
18:06:31 <rbergeron> ctyler: have fun there :)
18:06:46 <bmahe> tflink, so do i :)
18:06:49 <tflink> disco in particular since it seems to be more python-friendly than hadoop
18:07:19 <bmahe> tflink, not to come back to hadoop, but there are a few python wrappers for hadoop
18:07:19 <tflink> yes, I know you can use python with hadoop - I've done it before
18:07:46 <bmahe> also most projects use thrift or avro or prtocol buffer, which are language agnostic
18:08:03 <rbergeron> #idea interest in disco - seems to be more python-friendly than hadoop (though we are aware that there are python wrappers for hadoop)
18:08:31 <tflink> I was using mrjob with EMR
18:10:21 <bmahe> I see spring is packaged. we could also package spring-hadoop
18:10:25 <tflink> there are talks on disco at pycon US and pydata 2013 this year
18:11:24 <rbergeron> #info spring is packaged - could package spring-hadoop
18:11:41 <rbergeron> tflink: i know we have folks at pycon - not sure on pydata
18:12:04 <tflink> rbergeron: pydata is during the sprints following pycon - I'm not sure if anyone is going either
18:12:17 <bmahe> there are also statsmodel and patsy which could be nice for pandas
18:12:21 <tflink> I've been thinking about it but haven't decided if its worth the admission cost yet
18:12:51 <rbergeron> #action bmahe to expound on openjdk/bug filing, as well as the wide world of bigtop packaging, as time permits :)
18:13:04 <bmahe> sure, will do
18:13:14 <rbergeron> tflink: ticket cost or travel cost?
18:13:20 <rbergeron> tflink: or some combo thereof :)
18:14:01 <tflink> rbergeron: I'm already going to pycon and staying for the sprints. it's the ticket cost and whether it would be better to spend that time @ the sprints
18:14:25 <rbergeron> ahhhh
18:15:03 <rbergeron> tflink: you might talk to lh - I think she is helping to organize some of that, perhaps she can shed light on it
18:15:50 <rbergeron> #topic Operation Agenda: Yeah...
18:15:53 <rbergeron> Well...
18:15:53 * rbergeron throws a rock at the bot
18:15:54 <rbergeron> This bodes well.
18:15:58 <rbergeron> There we go.
18:16:19 <rbergeron> I think we veered a bit but still came up with some interesting things.
18:16:52 <rbergeron> I'm not sure if we'r emeeting'd out - tflink, did we cover your additional bases?
18:17:13 <tflink> rbergeron: additional bases?
18:17:56 <bmahe> rbergeron, what about deployment, orchestration and the cloud?
18:18:18 <rbergeron> you alluded to "other than hadoop" - and mentioned a few things - wasn't sure if you wanted to go on :)
18:18:48 <tflink> no, I think we touched on what I had in mind
18:18:49 <rbergeron> bmahe: if you're willing to go on, i'm willing to continue taking notes - I'm not sure if we've lost the others yet :)
18:19:10 <bmahe> rbergeron, I will add it to my email :)
18:19:53 <rbergeron> bmahe: that would be delightful.
18:20:35 <rbergeron> bmahe: we have a cloud sig - of course the orchestration stuf plays all over.
18:20:39 <rbergeron> my, i can't type today.
18:20:52 <rbergeron> Anyone else have anything they'd like todiscuss?
18:21:17 <bmahe> rbergeron, am already subscribed to the cloud sig :)
18:21:26 <rbergeron> I think we might have to sit on "what would you like to do" in a more organized fashion until next week :)
18:22:01 <rbergeron> bmahe: excellent, apologies if i've been blind to any mails you've sent there :)
18:22:15 <tflink> yeah, it looks like we've lost most people
18:22:32 <tflink> and some of the "what would we like to do" might work better on the list anyways
18:22:35 <rbergeron> yeah.
18:22:43 <bmahe> rbergeron, I did not send any
18:23:02 <rbergeron> it's just nice to have a discussion and stuff to get a general feel for things. :)
18:23:54 <rbergeron> #action rbergeron to prod in meeting notes to get people to talk re: what would we like to do (we==they)
18:24:24 * rbergeron thinks she's got most things accounted for - and thus i shall start the timer to countdown
18:24:33 <rbergeron> unless anyone objects :)
18:24:46 <rbergeron> 87, 29, 14,....
18:24:48 <rbergeron> 10...
18:24:55 <rbergeron> 3, 2, 1.
18:25:01 <rbergeron> Thanks for coming, everyone.
18:25:14 <tflink> rbergeron: thanks for leading
18:25:17 <rbergeron> This was highly informative and actually exciting :D which is awesome
18:25:31 <bmahe> thanks a lot!
18:25:31 <rbergeron> tflink: surely :)
18:26:10 <rbergeron> bmahe: thanks for joining today! looking forward to hearing from you.
18:26:13 <rbergeron> #endmeeting