15:32:29 #startmeeting Measuring the Fedora community with Census 15:32:29 Meeting started Sun Aug 11 15:32:29 2013 UTC. The chair is flock-ectr112. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:32:29 Useful Commands: #action #agreed #halp #info #idea #link #topic. 15:33:06 #chair mizmo 15:33:06 Current chairs: flock-ectr112 mizmo 15:34:20 good 15:34:23 - large DB 15:34:29 - adopted outside fedora 15:34:31 Bad 15:34:36 - opt-in 15:34:44 - flaws in the design 15:34:48 -- scalability 15:34:54 -- complicated collections of plugins 15:35:12 -- custom / one-off queries difficult 15:35:23 -- custom UI code required 15:35:44 need to find who is maintaining it 15:35:54 need to figure out the db design 15:36:02 need to make the query 15:36:15 by the time you have all this, you gave up 15:36:40 less useful than hoped 15:36:54 less useful -> less maintain -> retired 15:37:00 new idea: Census 15:37:23 - Opt-out for basic anonymous data 15:37:42 (on by default, anonymous data, hardware info, crashes info, packages info...) 15:37:47 - scalability 15:37:51 Better design 15:37:58 - flexible collector framework 15:38:05 client should be very simple to collect the data 15:38:09 - reusable query API 15:38:23 allowing to run queries w/o having to talk to the dev themsevles 15:38:32 embed that in your application 15:38:41 - simple query prototyping tool 15:39:00 sandbox to build your query to integrate in your own app 15:39:09 we provide a service and an API 15:39:21 you integrate that in your app/stats 15:39:40 you can submit data about anything 15:40:13 (# of download, # of updates, not just simply when people install) 15:40:25 Prototype in openshift 15:40:55 receiver : collections of the plugin, receive the data received and put it in the database 15:41:27 you'll need plugin to the receiver for each type of data one want to store 15:41:40 The query API is read only and return JSON output 15:41:53 in the same way that JSON is used to upload data 15:42:14 the query prototyper is just submitting the info to the query API 15:42:42 scalable on the DB level of query API level but the API itself will remain consisten 15:42:43 +t 15:43:37 plugin should be very easy to write 15:43:51 plugins just print JSON to stdout 15:44:07 receiver has two tasks 15:44:09 - indexing 15:44:14 - insert the data 15:44:35 more might come in the future according to needs 15:45:06 every plugin has access to the whole dataset submitted 15:45:32 inter-plugin compatibility -> data passing / ordering 15:46:41 the index method of the plugin is only ran once 15:46:52 and define the structure required 15:47:14 the process method processes the input submitted and return the JSON blob to insert in the database 15:47:37 actually the process method directly inserts in the db 15:47:45 the query API 15:47:54 - HTTP Post with 2 parameters 15:48:06 -- a JavaScript function (func) 15:48:17 -- the argument to pass to the function (args) 15:48:36 the js is ran on a read-only javascript sandbox 15:48:44 returned values is JSON encoded 15:48:52 Query prototyper 15:48:59 - static HTML page w/ javascript 15:49:18 helps to build the query and submit it to the Query APi 15:49:41 useful for one-off queries 15:49:54 data returned dynamically displayed (using js) 15:50:18 using these tools one can directly browse the db scheme live 15:50:29 demo is at: 15:51:13 first example : ' return " Hello world!"; 15:51:35 second example : return {"Title" : " Hello world!"} 15:51:50 second example : return {"Subject" : " Hello world!", " foo" : "bar"} 15:52:02 third example querying the db itself 15:52:08 return db.getCollectionNames(); 15:52:31 return [" collections"].concat(db.getCollectionNames()); 15:52:38 Names the table ^ 15:52:52 return db.hardware.pci.findOne(); 15:53:02 returns a pci device information 15:53:08 return db.hardware.ub.findOne(); 15:53:11 return db.hardware.usb.findOne();* 15:53:16 same query for a usb device 15:53:37 return db.hardware.usb.find(); -> returns a cursor rather than a JSON valid object 15:53:46 return db.hardware.usb.find().toArray(); 15:53:56 which return the whole collection as JSON 15:54:08 return db.hardware.usb.find({vendor:3599}).toArray(); 15:54:20 return usb info for a specific vendor 15:54:44 /!\ What out : DB scheme subject to changes! 15:54:56 return db.hardware.profile.findOne(); 15:55:03 returns a profile of hardware 15:55:13 state of the current hardware on the device 15:55:32 return db.checkin.find().toArray(); 15:55:37 list of all the checkin 15:55:51 this will get bigger as there is a checkin for each insert 15:56:09 lots of possibilities 15:56:20 can be integrated into more application 15:57:07 id are uniques 15:57:16 profiles will not be store redundantly 15:58:04 one object for each hardware device in the hardware.pci document 15:58:35 return db.hardware.pci.find().toArray(); 15:59:17 can be used for anything, not just kernel information 15:59:32 hits on urls can be stored 16:00:34 checkin is used to quantify the number of profile submitted 16:00:50 so to get the top 10 video cards you will go from checkin to profile to hardware 16:01:09 profiles will give the number of time a specific hardware exists 16:01:24 checkin will provide the number of time each profile has been submitted 16:01:33 future considerations 16:01:36 - replicates 16:01:38 -- sharing 16:01:45 -- master/slave construction 16:01:54 would allow scaling up while preserving the API 16:02:06 - Opt-in/Opt-out data policy 16:02:27 opt-in by default in anonymous, some data might require an opt-in 16:02:41 - Expand the collection framework 16:02:51 Based on smolt but can be expanded 16:03:33 last stop in the path to get data from its origin to the user 16:03:38 queries should be fast 16:03:59 might require a level of translation b/w the data and censu maybe in some case 16:04:10 client/server is ready, will be uploaded to fedorahosted 16:04:39 current collectors: uuid, hardware.pci, hardware.usb, software.os and software.rpm 16:05:15 linking the uuid from census to the retrace server (and darkserver?) 16:05:43 using the gnu_build_id ? 16:06:10 pci slot might be nice to store as well 16:06:20 software.os -> cpe info 16:06:29 software.rpm -> output of rpm -qa 16:06:36 TODO: 16:06:43 - define collection requirements 16:06:47 - Nail down schema 16:06:54 - Opt-in/Opt-out policy 16:07:02 - Anaconda / Firstboot integration? 16:07:12 (checkbox to say don' t submit my data) 16:07:18 or more complex 16:07:35 plugin can be enabled/disabled from the command line 16:07:39 - release 16:07:42 - package into Fedora 16:07:52 Hackfest at 2:00pm today! 16:08:05 more info on http://fedorahosted.org/census 16:08:26 live-demo: http://census-npmccallumfedora.rhcloud.com 16:08:49 service easy to deploy 16:09:03 Fedora 21 to get it more integrated into Fedora 16:09:18 questions ? 16:09:28 thanks 16:09:30 #endmeeting