#fedora-meeting-1: Big Data SIG

Meeting started by rbergeron at 16:59:44 UTC (full logs).

Meeting summary

Who's around for fun? (rbergeron, 17:00:05)
1. present: rbergero, tflink (rbergeron, 17:01:15)
2. present: witlessb (rbergeron, 17:01:45)
3. present: threebean, zoglesby, samkottler, jsmith (rbergeron, 17:02:18)
Agenda for today's first meeting :D (rbergeron, 17:03:49)
1. http://lists.fedoraproject.org/pipermail/bigdata/2013-March/000003.html (rbergeron, 17:04:32)
2. Agenda looks like: What this is all about, what do we have, what don't we have, what is anyone here interested in doing :) (rbergeron, 17:05:09)
What's the Big Data SIG all about? (rbergeron, 17:05:57)
1. loosely quoting from o'reilly: "If the size of your data is part of the problem, it's Big Data." (rbergeron, 17:07:57)
2. IDEA: one part of it is getting a decent setup; other part is understanding tools and approaches needed (rbergeron, 17:11:42)
3. IDEA: two pieces you hear most about in Big Data seem to be massive storage, & parallel computing (hadoop, column databases, etc) (rbergeron, 17:12:16)
4. IDEA: another component is online processing or online analysis - predicting what is trending before its had time to hit disk (rbergeron, 17:13:29)
5. IDEA: for ex. - idea that google can predict flu outbreaks faster than public health agencies by watcihng search terms; financial tools as well apply to concept (rbergeron, 17:15:49)
6. IDEA: another ex. of stream processing is twitter analytics - looking for emerging topics in twitter streams (rbergeron, 17:16:52)
What are the buckets or categories, and what do we have? (rbergeron, 17:18:21)
1. IDEA: orchestration, batch processing, stream processing are categories that come to mind - orch (zookeeper), batch (hadoop, disco), stream (storm) (rbergeron, 17:21:23)
2. IDEA: storage is another category (rbergeron, 17:22:25)
3. IDEA: full hadoop stack seems to be thought of as useful foundation layer for some types of work, but HDFS is getting attention as weak spot, with nosql dbs and gluster being used as alternatives (rbergeron, 17:23:00)
4. hdfs is part of the hadoop project; one can install hdfs without using the mapreduce part (rbergeron, 17:28:05)
5. lots of java libraries, servers like tomcat (used by solr, oozie) are already in (rbergeron, 17:44:01)
6. IDEA: we have pandas (not the animal, http://pandas.pydata.org) - useful for data analysis, not really big data (rbergeron, 17:49:40)
7. ACTION: rbergeron to add a sub-page of packges we have (unless someone beats me to it) (rbergeron, 17:50:18)
8. IDEA: interest in disco - seems to be more python-friendly than hadoop (though we are aware that there are python wrappers for hadoop) (rbergeron, 18:08:03)
9. spring is packaged - could package spring-hadoop (rbergeron, 18:11:24)
10. ACTION: bmahe to expound on openjdk/bug filing, as well as the wide world of bigtop packaging, as time permits :) (rbergeron, 18:12:51)
Operation Agenda: Yeah... (rbergeron, 18:15:50)
1. ACTION: rbergeron to prod in meeting notes to get people to talk re: what would we like to do (we==they) (rbergeron, 18:23:54)

Meeting ended at 18:26:13 UTC (full logs).

Action items

rbergeron to add a sub-page of packges we have (unless someone beats me to it)
bmahe to expound on openjdk/bug filing, as well as the wide world of bigtop packaging, as time permits :)
rbergeron to prod in meeting notes to get people to talk re: what would we like to do (we==they)

Action items, by person

bmahe
1. bmahe to expound on openjdk/bug filing, as well as the wide world of bigtop packaging, as time permits :)
rbergeron
1. rbergeron to add a sub-page of packges we have (unless someone beats me to it)
2. rbergeron to prod in meeting notes to get people to talk re: what would we like to do (we==they)

People present (lines said)

rbergeron (163)
bmahe (59)
tflink (36)
ctyler (27)
threebean (8)
zodbot (5)
witlessb (4)
samkottler (1)
zoglesby (1)
jsmith (1)

Generated by MeetBot 0.1.4.