13:29:54 #startmeeting hubs-devel 13:29:54 Meeting started Thu Jan 5 13:29:54 2017 UTC. The chair is mizmo. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:29:54 Useful Commands: #action #agreed #halp #info #idea #link #topic. 13:29:54 The meeting name has been set to 'hubs-devel' 13:30:31 so what kinds of things are we stuck on, and can we outline to the steps to approaching them for someone new wanting to help? 13:30:48 these are the milestone issues, is there anything anyone could help us with here: https://pagure.io/fedora-hubs/issues?status=Open&tags=milestone 13:31:00 the major chunks of work are 13:31:02 1) zanata 13:31:07 2) waartaa 13:31:14 3) badges 13:31:41 4) release cycle widget (i think the backend for that needs more work / integration stuff?) 13:31:44 5) bookmarks bar 13:32:01 6) help widget / err bot 13:32:43 yes, these are the major things we are working on 13:33:25 1. Zanata - we need to deploy zanata2fedmsg first to prod 13:33:48 right now it's in staging (deployed by raplh) 13:33:56 s/raplh/ralph 13:34:06 okay is there anything there someone could help with? any kind of atomic chunk of work? 13:34:56 a2batic already pitched to help but this is a big task 13:35:01 and there are multiple issues 13:35:32 so people can easily work on different issues. 13:35:50 can we make a list of the issues involved? 13:36:05 but without the sample data from the zanata it might be tough to work on 13:36:24 I had dropped a mail to aeng before the vacation but did not get reply 13:36:35 so I need to ask on that thread again 13:36:48 i had thought we got the sample data already 13:37:47 mizmo: yes, so this is what we have 13:37:49 #link https://zanata.atlassian.net/browse/ZNTA-1166 13:37:51 #action sayan to ask aeng for sample data from zanata 13:38:29 sayan: the example payload? but you need something more? 13:39:43 mizmo: we can start working based on the example payload 13:40:03 sayan: oh okay so this isn't blocked? 13:40:07 and then change to once zanata2fedmsg is in prod and there are diffs 13:40:38 mizmo: not really blocked, but it's good to have example payload from multiple scenarios 13:41:16 mizmo: ... and the issues, we already have the widgets from translations right? 13:41:35 there's mockups for them, but i dont know if there's anything beyond that? 13:42:02 do we have everything needed for someone to write each widget? 13:42:46 mizmo: that's what I could not make out from the example payload 13:43:58 mizmo: what we can do is someone start working on it. if changes are need we can ping aeng 13:44:07 or somebody else from the zanata team 13:44:24 so how would we direct someone to work on it? where would they start? 13:44:57 mizmo: so they start to work based on the sample payload we have 13:45:20 since we are pushing the same payload to fedmsg, the structure of the message would be same 13:45:31 so they can start writing the widget 13:45:37 sayan: can we walk through it in terms of steps for a new contributor? 13:45:48 mizmo: yes, we can do that 13:45:49 eg step 1 is to set up the hubs dev env, which is documented well 13:46:04 step 2 - do they have to have a fedora zanata account? 13:46:48 mizmo: they don't need zanata account 13:47:09 the data comes from fedmsg, so it's like just another widget based on fedmsg 13:47:10 they dont need any kind of zanata account or env, they just need to look at the sample payload? 13:47:15 oh okay 13:47:55 so if they're trying to do stuff based on statistics but the payload is only for a single doc update, how are they getting aggregate data? 13:49:35 the data would be cached, on receiving new data would be computed 13:50:05 also till now the graphs that I have seen is percentage completion 13:50:19 they can be computed from the payload, isnt it? 13:50:45 where is the data cached? 13:50:59 sample payload: 13:51:00 {"username":"aeng","project":"Zanata","version":"master","docId":"doc1id","locale":"zh-CN","wordDeltasByState":{"New":-16,"Translated":16},"type":"DocumentStatsEvent"} 13:51:19 so it gives you a word delta, but i'm assuming that's for the last event, not the entire doc 13:51:35 how would you get something like percentage completion from that? 13:53:01 correct me if I am wrong 13:53:21 I was thinking New + Translated = Total 13:54:42 * sayan is reading again 13:54:48 new + translated in that payload would be = 0 13:55:26 my interpretation of this (which could be wrong) is that this is a message emitted as a result of a submission to a document 13:56:05 no, I could not interpret the negative symbol there, so I was thing 16+16=32 13:56:16 this specific document submission was aeng submitting 16 lines of translation. i'm not sure why "new" is -16: but i think it's because 16 lines had the 'new' state removed from them since they are translated now they are no longer new 13:56:23 i think it's negative on purpose 13:56:42 i think it means 16 lines were removed from "new" state and moved to "translated" state 13:57:03 it's a delta, so it's not a total 13:57:07 a delta is only partial right 13:57:20 it only shows changes - so i think it's just state changes 13:59:42 mizmo: what is "lines of translation"? 13:59:53 sayan: line == string 14:00:04 (does that make sense?) 14:00:42 yes 14:01:51 so i dont see how this example payload gets us aggregate stats.... unless fedmsg can aggregate for us?? 14:03:44 we need to do on hubs level 14:05:13 where would we cache / store it? we dont normally get into storing things in hubs right ? 14:05:33 mizmo: no we don't store things in hubs 14:05:35 and if a document already exists before we turn on the zanata stuff - how do we get the stats from before it was emitting messages so we have a picture of the full doc? 14:06:01 so would hubs do a query on fedmsg and calculate the aggregate on the fly? so it would have to calculate it again every time the page is loaded? 14:06:18 that would be really heavy 14:07:19 in that case if zanata has api 14:07:51 the initial pulling of the data can be done via API and consecutive updates via fedmsg 14:09:03 okay so theres two major parts to this feature then - 14:09:06 implementing the initial data suckage from zanata via the API 14:09:14 and then getting the consecutive updates to work 14:09:33 but once the data is sucked in via api, where do we put it? do we have a cache we can use? how do we use it? 14:09:49 we aggregate it and cache it 14:10:28 mizmo: pingou: btw, why don't we have a database for hubs? 14:11:20 my understanding is we dont want hubs to be a data store or to replace any of the apps it grabs data from, it's meant to be a front end 14:12:02 would someone working on this know how to aggregate and cache in hubs? is there a way in flask that is hooked up already to do that? 14:12:42 yes, we already have a cache in place 14:13:40 sayan: it's all cached indeed 14:13:58 pingou: yes, so we cache is basically our database right? 14:14:07 yup 14:14:25 okay so then 14:14:50 mizmo: so the thing is when we aggregate we pull data from cache, compute new data and push to cache again 14:15:38 sayan: so doesthere need to be some kind of watcher process watching for the new fedmsgs to come in from zanata and to update the cache? 14:16:05 mizmo: things are already in place afaik 14:17:00 mizmo: https://pagure.io/fedora-hubs/blob/develop/f/hubs/widgets/badges.py#_31 14:18:01 here when it receives a message for a topic it invalidates 14:20:09 but we need to look for the zanata API 14:20:19 on what all things does it provides 14:23:35 sayan: well we only need the API for the initial data dump right? or do we need it when updating the cache with fresh msgs too? 14:24:22 mizmo: yes, only for the first pull 14:25:00 would you use the same process for updating the cache as diong the initial data dump? 14:25:33 mizmo: no, it will be different 14:26:13 so the badges.py example is an example someone could follow to understand how to update the cache from fedmsg? do we have any examples someone could follow to understand how to do the initial data dump? 14:26:41 mizmo: yes 14:29:33 sayan: have any pointers? 14:29:42 mizmo: https://pagure.io/fedora-hubs/blob/develop/f/hubs/widgets/badges.py#_21 14:29:50 we can handle here itself 14:29:51 (i ask because i want to try to write these up) 14:30:24 so this is grabbing the entire JSON of badge data, so it's like an initial data dump 14:30:27 right? 14:31:13 the workflow is when the page is first loaded the badge data is fetch and cached 14:32:08 and whenever there is a new badge awarded, we invalidate the cache and during the next page load the `data` method is hit again 14:32:20 s/method/function/ 14:32:31 and the new data is pulled 14:32:59 in our case we don't invalidate the cache, rather keep updating the cache 14:33:08 makes sense? 14:33:26 ah okay it does! 14:33:58 this is specifically for one user though so it's relatively light weight isnt it? 14:34:10 the zanata stuff - the amount of data is going to be massive isnt t 14:34:36 mizmo: won't this be team specific? 14:34:46 the data that is pulled? 14:34:57 let me check the mockup but i think its more general 14:35:05 for the main translation hub anyway 14:35:06 there are individual lang team hubs too tho 14:35:27 https://pagure.io/fedora-hubs/issue/261 14:35:38 mizmo: language team hub is okay 14:35:58 if we are having common data, we can intelligently share 14:36:10 this one is across languages, per release, per projects (and each project has lots of docs i think) 14:36:23 this one is the same - https://pagure.io/fedora-hubs/issue/262 14:36:39 this one is a listing of all lang teams https://pagure.io/fedora-hubs/issue/263 14:36:50 although i think we dont need zanata for that last one 14:37:07 mizmo: yes 14:37:20 so its a huge amount of data i think, compared to the per user badge json 14:38:04 mizmo: if they aggregate and sent via api then it should not be much of a problem 14:38:58 anyways, if data is huge we notify the user that the data is huge and will be available in sometime 14:39:09 and then do the process in background 14:40:03 mizmo: btw, I need to see off somebody and finish off dinnerm can we continue the meeting 1900UTC? 14:43:02 sigh i think riot went down 14:43:22 sayan, can we use the API for updates? i thought we try to do everything with fedmsg. is it ok to not use fedmsg? 14:43:32 sayan, also i think it would be a really bad UX to have the user open the page and have to sit and wait for a long time 14:43:42 is there a way to precache it instead? 14:43:52 mizmo__: yes, we can do that too 14:43:55 so its not the page load event kicking off the cache, but some automated process? 14:44:03 its ok if the cache is slightly out of date i think 14:44:32 mizmo__: API won't be realtime 14:44:35 maybe theres an automated process that regularly updates the cache, and when it gets a page hit it does a refresh behind the scenes and updates the cache again (but user doesn't see anything) 14:45:41 sayan, i guess the big missing piece for me is how, once we already have the initial data dump, do we keep the data updated if the fedmsgs we get from zanata dont have any aggregate data and to aggregate the data ourselves means processing potentially thousands or more msgs 14:46:33 mizmo__: everytime there is a message, it hits the invalidate cache method 14:46:43 where we pull data from cache and update the cache again 14:46:56 sayan, but the messages for zanata don't map cleanly to the stats we have? 14:47:14 well let me walk thru it 14:47:27 so say my widget says based on the big data dump that japanese translation is 81% complete 14:47:45 then a translator submits a new, let's say large chunk of translated strings, so it should be 90% complete now 14:48:08 mizmo__: yes, that's something we need to figure out and connect the API and the messages 14:48:11 lets say there is one fedmsg for that, lets say it's a few hundred strings 14:48:24 so the msg comes in, it hits the invalidate cache method 14:48:25 .... 14:48:28 then what happens? 14:48:43 it only affects the japanese stat. do the other langs in the table get recomputed even tho the msg doesn't affect them? 14:49:06 you get the payload in the invalidate cache method, right? 14:49:16 thats what im assuming i think 14:49:48 you already have stats 89% in cache, you pull it, add it with the current payload and update the cache with the new data 14:50:36 okay so something in your update cache method should probably process the payload to figure out what lang it is for and only update that piece in the cache 14:51:03 mizmo__: I need to leave for a few hours now to see off someone and also finish off dinner, can we continue at 1900UTC 14:51:05 mizmo__: yes 14:51:17 you update only what needed 14:51:41 sayan, yes i can do 1900 utc! thank you! this has been so helpful 14:52:05 #endmeeting 14:52:09 mizmo__: thanks! see you at 1900UTC 14:52:14 see you! 15:01:11 #endmeeting