13:29:54 <mizmo> #startmeeting hubs-devel
13:29:54 <zodbot> Meeting started Thu Jan  5 13:29:54 2017 UTC.  The chair is mizmo. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:29:54 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
13:29:54 <zodbot> The meeting name has been set to 'hubs-devel'
13:30:31 <mizmo> so what kinds of things are we stuck on, and can we outline to the steps to approaching them for someone new wanting to help?
13:30:48 <mizmo> these are the milestone issues, is there anything anyone could help us with here: https://pagure.io/fedora-hubs/issues?status=Open&tags=milestone
13:31:00 <mizmo> the major chunks of work are
13:31:02 <mizmo> 1) zanata
13:31:07 <mizmo> 2) waartaa
13:31:14 <mizmo> 3) badges
13:31:41 <mizmo> 4) release cycle widget (i think the backend for that needs more work / integration stuff?)
13:31:44 <mizmo> 5) bookmarks bar
13:32:01 <mizmo> 6) help widget / err bot
13:32:43 <sayan> yes, these are the major things we are working on
13:33:25 <sayan> 1. Zanata - we need to deploy zanata2fedmsg first to prod
13:33:48 <sayan> right now it's in staging (deployed by raplh)
13:33:56 <sayan> s/raplh/ralph
13:34:06 <mizmo> okay is there anything there someone could help with? any kind of atomic chunk of work?
13:34:56 <sayan> a2batic already pitched to help but this is a big task
13:35:01 <sayan> and there are multiple issues
13:35:32 <sayan> so people can easily work on different issues.
13:35:50 <mizmo> can we make a list of the issues involved?
13:36:05 <sayan> but without the sample data from the zanata it might be tough to work on
13:36:24 <sayan> I had dropped a mail to aeng before the vacation but did not get reply
13:36:35 <sayan> so I need to ask on that thread again
13:36:48 <mizmo> i had thought we got the sample data already
13:37:47 <sayan> mizmo: yes, so this is what we have
13:37:49 <sayan> #link https://zanata.atlassian.net/browse/ZNTA-1166
13:37:51 <mizmo> #action sayan to ask aeng for sample data from zanata
13:38:29 <mizmo> sayan: the example payload? but you need something more?
13:39:43 <sayan> mizmo: we can start working based on the example payload
13:40:03 <mizmo> sayan: oh okay so this isn't blocked?
13:40:07 <sayan> and then change to once zanata2fedmsg is in prod and there are diffs
13:40:38 <sayan> mizmo: not really blocked, but it's good to have example payload from multiple scenarios
13:41:16 <sayan> mizmo: ... and the issues, we already have the widgets from translations right?
13:41:35 <mizmo> there's mockups for them, but i dont know if there's anything beyond that?
13:42:02 <mizmo> do we have everything needed for someone to write each widget?
13:42:46 <sayan> mizmo: that's what I could not make out from the example payload
13:43:58 <sayan> mizmo: what we can do is someone start working on it. if changes are need we can ping aeng
13:44:07 <sayan> or somebody else from the zanata team
13:44:24 <mizmo> so how would we direct someone to work on it? where would they start?
13:44:57 <sayan> mizmo: so they start to work based on the sample payload we have
13:45:20 <sayan> since we are pushing the same payload to fedmsg, the structure of the message would be same
13:45:31 <sayan> so they can start writing the widget
13:45:37 <mizmo> sayan: can we walk through it in terms of steps for a new contributor?
13:45:48 <sayan> mizmo: yes, we can do that
13:45:49 <mizmo> eg step 1 is to set up the hubs dev env, which is documented well
13:46:04 <mizmo> step 2 - do they have to have a fedora zanata account?
13:46:48 <sayan> mizmo: they don't need zanata account
13:47:09 <sayan> the data comes from fedmsg, so it's like just another widget based on fedmsg
13:47:10 <mizmo> they dont need any kind of zanata account or env, they just need to look at the sample payload?
13:47:15 <mizmo> oh okay
13:47:55 <mizmo> so if they're trying to do stuff based on statistics but the payload is only for a single doc update, how are they getting aggregate data?
13:49:35 <sayan> the data would be cached, on receiving new data would be computed
13:50:05 <sayan> also till now the graphs that I have seen is percentage completion
13:50:19 <sayan> they can be computed from the payload, isnt it?
13:50:45 <mizmo> where is the data cached?
13:50:59 <mizmo> sample payload:
13:51:00 <mizmo> {"username":"aeng","project":"Zanata","version":"master","docId":"doc1id","locale":"zh-CN","wordDeltasByState":{"New":-16,"Translated":16},"type":"DocumentStatsEvent"}
13:51:19 <mizmo> so it gives you a word delta, but i'm assuming that's for the last event, not the entire doc
13:51:35 <mizmo> how would you get something like percentage completion from that?
13:53:01 <sayan> correct me if I am wrong
13:53:21 <sayan> I was thinking New + Translated = Total
13:54:42 * sayan is reading again
13:54:48 <mizmo> new + translated in that payload would be = 0
13:55:26 <mizmo> my interpretation of this (which could be wrong) is that this is a message emitted as a result of a submission to a document
13:56:05 <sayan> no, I could not interpret the negative symbol there, so I was thing 16+16=32
13:56:16 <mizmo> this specific document submission was aeng submitting 16 lines of translation. i'm not sure why "new" is -16: but i think it's because 16 lines had the 'new' state removed from them since they are translated now they are no longer new
13:56:23 <mizmo> i think it's negative on purpose
13:56:42 <mizmo> i think it means 16 lines were removed from "new" state and moved to "translated" state
13:57:03 <mizmo> it's a delta, so it's not a total
13:57:07 <mizmo> a delta is only partial right
13:57:20 <mizmo> it only shows changes - so i think it's just state changes
13:59:42 <sayan> mizmo: what is "lines of translation"?
13:59:53 <mizmo> sayan: line == string
14:00:04 <mizmo> (does that make sense?)
14:00:42 <sayan> yes
14:01:51 <mizmo> so i dont see how this example payload gets us aggregate stats.... unless fedmsg can aggregate for us??
14:03:44 <sayan> we need to do on hubs level
14:05:13 <mizmo> where would we cache / store it? we dont normally get into storing things in hubs right ?
14:05:33 <sayan> mizmo: no we don't store things in hubs
14:05:35 <mizmo> and if a document already exists before we turn on the zanata stuff - how do we get the stats from before it was emitting messages so we have a picture of the full doc?
14:06:01 <mizmo> so would hubs do a query on fedmsg and calculate the aggregate on the fly? so it would have to calculate it again every time the page is loaded?
14:06:18 <sayan> that would be really heavy
14:07:19 <sayan> in that case if zanata has api
14:07:51 <sayan> the initial pulling of the data can be done via API and consecutive updates via fedmsg
14:09:03 <mizmo> okay so theres two major parts to this feature then -
14:09:06 <mizmo> implementing the initial data suckage from zanata via the API
14:09:14 <mizmo> and then getting the consecutive updates to work
14:09:33 <mizmo> but once the data is sucked in via api, where do we put it? do we have a cache we can use? how do we use it?
14:09:49 <sayan> we aggregate it and cache it
14:10:28 <sayan> mizmo: pingou: btw, why don't we have a database for hubs?
14:11:20 <mizmo> my understanding is we dont want hubs to be a data store or to replace any of the apps it grabs data from, it's meant to be a front end
14:12:02 <mizmo> would someone working on this know how to aggregate and cache in hubs? is there a way in flask that is hooked up already to do that?
14:12:42 <sayan> yes, we already have a cache in place
14:13:40 <pingou> sayan: it's all cached indeed
14:13:58 <sayan> pingou: yes, so we cache is basically our database right?
14:14:07 <pingou> yup
14:14:25 <mizmo> okay so then
14:14:50 <sayan> mizmo: so the thing is when we aggregate we pull data from cache, compute new data and push to cache again
14:15:38 <mizmo> sayan: so doesthere need to be some kind of watcher process watching for the new fedmsgs to come in from zanata and to update the cache?
14:16:05 <sayan> mizmo: things are already in place afaik
14:17:00 <sayan> mizmo: https://pagure.io/fedora-hubs/blob/develop/f/hubs/widgets/badges.py#_31
14:18:01 <sayan> here when it receives a message for a topic it invalidates
14:20:09 <sayan> but we need to look for the zanata API
14:20:19 <sayan> on what all things does it provides
14:23:35 <mizmo> sayan: well we only need the API for the initial data dump right? or do we need it when updating the cache with fresh msgs too?
14:24:22 <sayan> mizmo: yes, only for the first pull
14:25:00 <mizmo> would you use the same process for updating the cache as diong the initial data dump?
14:25:33 <sayan> mizmo: no, it will be different
14:26:13 <mizmo> so the badges.py example is an example someone could follow to understand how to update  the cache from fedmsg? do we have any examples someone could follow to understand how to do the initial data dump?
14:26:41 <sayan> mizmo: yes
14:29:33 <mizmo> sayan: have any pointers?
14:29:42 <sayan> mizmo: https://pagure.io/fedora-hubs/blob/develop/f/hubs/widgets/badges.py#_21
14:29:50 <sayan> we can handle here itself
14:29:51 <mizmo> (i ask because i want to try to write these up)
14:30:24 <mizmo> so this is grabbing the entire JSON of badge data, so it's like an initial data dump
14:30:27 <mizmo> right?
14:31:13 <sayan> the workflow is when the page is first loaded the badge data is fetch and cached
14:32:08 <sayan> and whenever there is a new badge awarded, we invalidate the cache and during the next page load the `data` method is hit again
14:32:20 <sayan> s/method/function/
14:32:31 <sayan> and the new data is pulled
14:32:59 <sayan> in our case we don't invalidate the cache, rather keep updating the cache
14:33:08 <sayan> makes sense?
14:33:26 <mizmo> ah okay it does!
14:33:58 <mizmo> this is specifically for one user though so it's relatively light weight isnt it?
14:34:10 <mizmo> the zanata stuff - the amount of data is going to be massive isnt t
14:34:36 <sayan> mizmo: won't this be team specific?
14:34:46 <sayan> the data that is pulled?
14:34:57 <mizmo> let me check the mockup but i think its more general
14:35:05 <mizmo> for the main translation hub anyway
14:35:06 <mizmo> there are individual lang team hubs too tho
14:35:27 <mizmo> https://pagure.io/fedora-hubs/issue/261
14:35:38 <sayan> mizmo: language team hub is okay
14:35:58 <sayan> if we are having common data, we can intelligently share
14:36:10 <mizmo> this one is across languages, per release, per projects (and each project has lots of docs i think)
14:36:23 <mizmo> this one is the same - https://pagure.io/fedora-hubs/issue/262
14:36:39 <mizmo> this one is a listing of all lang teams https://pagure.io/fedora-hubs/issue/263
14:36:50 <mizmo> although i think we dont need zanata for that last one
14:37:07 <sayan> mizmo: yes
14:37:20 <mizmo> so its a huge amount of data i think, compared to the per user badge json
14:38:04 <sayan> mizmo: if they aggregate and sent via api then it should not be much of a problem
14:38:58 <sayan> anyways, if data is huge we notify the user that the data is huge and will be available in sometime
14:39:09 <sayan> and then do the process in background
14:40:03 <sayan> mizmo: btw, I need to see off somebody and finish off dinnerm can we continue the meeting 1900UTC?
14:43:02 <mizmo__> sigh i think riot went down
14:43:22 <mizmo__> sayan, can we use the API for updates? i thought we try to do everything with fedmsg. is it ok to not use fedmsg?
14:43:32 <mizmo__> sayan, also i think it would be a really bad UX to have the user open the page and have to sit and wait for a long time
14:43:42 <mizmo__> is there a way to precache it instead?
14:43:52 <sayan> mizmo__: yes, we can do that too
14:43:55 <mizmo__> so its not the page load event kicking off the cache, but some automated process?
14:44:03 <mizmo__> its ok if the cache is slightly out of date i think
14:44:32 <sayan> mizmo__: API won't be realtime
14:44:35 <mizmo__> maybe theres an automated process that regularly updates the cache, and when it gets a page hit it does a refresh behind the scenes and updates the cache again (but user doesn't see anything)
14:45:41 <mizmo__> sayan, i guess the big missing piece for me is how, once we already have the initial data dump, do we keep the data updated if the fedmsgs we get from zanata dont have any aggregate data and to aggregate the data ourselves means processing potentially thousands or more msgs
14:46:33 <sayan> mizmo__: everytime there is a message, it hits the invalidate cache method
14:46:43 <sayan> where we pull data from cache and update the cache again
14:46:56 <mizmo__> sayan, but the messages for zanata don't map cleanly to the stats we have?
14:47:14 <mizmo__> well let me walk thru it
14:47:27 <mizmo__> so say my widget says based on the big data dump that japanese translation is 81% complete
14:47:45 <mizmo__> then a translator submits a new, let's say large chunk of translated strings, so it should be 90% complete now
14:48:08 <sayan> mizmo__: yes, that's something we need to figure out and connect the API and the messages
14:48:11 <mizmo__> lets say there is one fedmsg for that, lets say it's a few hundred strings
14:48:24 <mizmo__> so the msg comes in, it hits the invalidate cache method
14:48:25 <mizmo__> ....
14:48:28 <mizmo__> then what happens?
14:48:43 <mizmo__> it only affects the japanese stat. do the other langs in the table get recomputed even tho the msg doesn't affect them?
14:49:06 <sayan> you get the payload in the invalidate cache method, right?
14:49:16 <mizmo__> thats what im assuming i think
14:49:48 <sayan> you already have stats 89% in cache, you pull it, add it with the current payload and update the cache with the new data
14:50:36 <mizmo__> okay so something in your update cache method should probably process the payload to figure out what lang it is for and only update that piece in the cache
14:51:03 <sayan> mizmo__: I need to leave for a few hours now to see off someone and also finish off dinner, can we continue at 1900UTC
14:51:05 <sayan> mizmo__: yes
14:51:17 <sayan> you update only what needed
14:51:41 <mizmo__> sayan, yes i can do 1900 utc! thank you! this has been so helpful
14:52:05 <mizmo__> #endmeeting
14:52:09 <sayan> mizmo__: thanks! see you at 1900UTC
14:52:14 <mizmo__> see you!
15:01:11 <mizmo> #endmeeting