#fedora-meeting-1: Big Data SIG
Meeting started by rbergeron at 16:59:44 UTC
(full logs).
Meeting summary
- Who's around for fun? (rbergeron, 17:00:05)
- present: rbergero, tflink (rbergeron,
17:01:15)
- present: witlessb (rbergeron,
17:01:45)
- present: threebean, zoglesby, samkottler,
jsmith (rbergeron,
17:02:18)
- Agenda for today's first meeting :D (rbergeron, 17:03:49)
- http://lists.fedoraproject.org/pipermail/bigdata/2013-March/000003.html
(rbergeron,
17:04:32)
- Agenda looks like: What this is all about, what
do we have, what don't we have, what is anyone here interested in
doing :) (rbergeron,
17:05:09)
- What's the Big Data SIG all about? (rbergeron, 17:05:57)
- loosely quoting from o'reilly: "If the size of
your data is part of the problem, it's Big Data." (rbergeron,
17:07:57)
- IDEA: one part of it is
getting a decent setup; other part is understanding tools and
approaches needed (rbergeron,
17:11:42)
- IDEA: two pieces you
hear most about in Big Data seem to be massive storage, &
parallel computing (hadoop, column databases, etc) (rbergeron,
17:12:16)
- IDEA: another component
is online processing or online analysis - predicting what is
trending before its had time to hit disk (rbergeron,
17:13:29)
- IDEA: for ex. - idea
that google can predict flu outbreaks faster than public health
agencies by watcihng search terms; financial tools as well apply to
concept (rbergeron,
17:15:49)
- IDEA: another ex. of
stream processing is twitter analytics - looking for emerging topics
in twitter streams (rbergeron,
17:16:52)
- What are the buckets or categories, and what do we have? (rbergeron, 17:18:21)
- IDEA: orchestration,
batch processing, stream processing are categories that come to mind
- orch (zookeeper), batch (hadoop, disco), stream (storm)
(rbergeron,
17:21:23)
- IDEA: storage is
another category (rbergeron,
17:22:25)
- IDEA: full hadoop stack
seems to be thought of as useful foundation layer for some types of
work, but HDFS is getting attention as weak spot, with nosql dbs and
gluster being used as alternatives (rbergeron,
17:23:00)
- hdfs is part of the hadoop project; one can
install hdfs without using the mapreduce part (rbergeron,
17:28:05)
- lots of java libraries, servers like tomcat
(used by solr, oozie) are already in (rbergeron,
17:44:01)
- IDEA: we have pandas
(not the animal, http://pandas.pydata.org) - useful for data
analysis, not really big data (rbergeron,
17:49:40)
- ACTION: rbergeron to
add a sub-page of packges we have (unless someone beats me to
it) (rbergeron,
17:50:18)
- IDEA: interest in disco
- seems to be more python-friendly than hadoop (though we are aware
that there are python wrappers for hadoop) (rbergeron,
18:08:03)
- spring is packaged - could package
spring-hadoop (rbergeron,
18:11:24)
- ACTION: bmahe to
expound on openjdk/bug filing, as well as the wide world of bigtop
packaging, as time permits :) (rbergeron,
18:12:51)
- Operation Agenda: Yeah... (rbergeron, 18:15:50)
- ACTION: rbergeron to
prod in meeting notes to get people to talk re: what would we like
to do (we==they) (rbergeron,
18:23:54)
Meeting ended at 18:26:13 UTC
(full logs).
Action items
- rbergeron to add a sub-page of packges we have (unless someone beats me to it)
- bmahe to expound on openjdk/bug filing, as well as the wide world of bigtop packaging, as time permits :)
- rbergeron to prod in meeting notes to get people to talk re: what would we like to do (we==they)
Action items, by person
- bmahe
- bmahe to expound on openjdk/bug filing, as well as the wide world of bigtop packaging, as time permits :)
- rbergeron
- rbergeron to add a sub-page of packges we have (unless someone beats me to it)
- rbergeron to prod in meeting notes to get people to talk re: what would we like to do (we==they)
People present (lines said)
- rbergeron (163)
- bmahe (59)
- tflink (36)
- ctyler (27)
- threebean (8)
- zodbot (5)
- witlessb (4)
- samkottler (1)
- zoglesby (1)
- jsmith (1)
Generated by MeetBot 0.1.4.