15:00:56 #startmeeting fedora-qadevel 15:00:56 Meeting started Mon Jan 16 15:00:56 2017 UTC. The chair is tflink. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:56 Useful Commands: #action #agreed #halp #info #idea #link #topic. 15:00:56 The meeting name has been set to 'fedora-qadevel' 15:00:56 #meetingname fedora-qadevel 15:00:56 The meeting name has been set to 'fedora-qadevel' 15:00:56 #topic Roll Call 15:01:05 * mkrizek is here 15:01:09 * garretraziel is here 15:01:54 * jskladan lurks 15:02:22 * roshi is here 15:02:53 * kparal is here 15:03:06 .fas linuxmodder 15:03:07 linuxmodder: linuxmodder 'Corey W Sheldon' 15:03:19 finishing up two other mtgs might be delayed to respond 15:03:43 #chair mkrizek garretraziel jskladan robyduck kparal linuxmodder 15:03:43 Current chairs: garretraziel jskladan kparal linuxmodder mkrizek robyduck tflink 15:03:56 let's get this party started 15:04:05 #topic Announcements and Information 15:04:14 #info taskotron stg rebuild complete - tflink, mkrizek 15:04:14 #info libtaskotron 0.4.18 released and deployed to dev and stg - mkrizek 15:04:14 #info fix for taskotron-trigger CLI posted for review - mkrizek 15:04:14 #link https://phab.qa.fedoraproject.org/D1081 15:04:32 any additional announcements? 15:05:47 #info started working on the dashboards - lbrabec, jskladan 15:07:19 * tflink assumes that there are no additional things 15:07:41 so, moving on to ... 15:07:44 #topic Rebuilding Taskotron Production 15:08:32 we were able to get stg working last week but from the things we found, there are a few kinks to work out before rebuilding prod 15:08:56 1. even with truncating the db, migration still takes a long time 15:09:21 2. file migration also takes a long time and it turns out that moving buildmaster files to nfs isn't really an option for now 15:10:22 re: 2., I'd go with stg setup - buildmaster on local storage, artifacts on nfs 15:10:23 for 1. I'm thinking the following: increase the memory on the db host when the outage starts and look into a "log" function for resultsdb that can produce a csv-like file which can be replayed later 15:11:29 my thought being that we could migrate a snapshot of the resultsdb db to the new schema offline and once we start the outage and get the new instance set up (with new db), we could "play back" the csv to get newer results in the db 15:11:40 mkrizek: that sounds good to me 15:12:04 jskladan: any thoughts on that plan for resultsdb db migration? 15:12:15 ad 1) The other thing I thought of was doing the migration "in background" - on a db copy, which would leave us with day(s) worth of data at most to migrate during the outage 15:12:42 👍 15:12:48 but log-to-file + replay for the downtime could be just as good 15:12:49 jskladan: is there an easier way to do that than what I mentioned with the csv file? 15:13:16 * tflink wasn't sure if migrating data from old to new would be feasible 15:15:07 the thing with csv-backup + replay is, that we'd have to make sure everything is done "properly" on the csv layer - id's stuff like that 15:16:08 I'll have to look at what specific way we currently use resultsdb - i.e. how much back-and-forth there is between the client and server 15:16:43 yeah, It'd require work but I figure that storing the incoming data and writing a quick script that uses resultsdb_api to submit results from csv would work well enough and if we make the assumption that only results will be submitted, it's easier to code both the csv bits and the "replayer" 15:16:44 and how complicated would mocking that up for the csv-store be 15:17:31 I assert that we need to get the redeployment done in the next couple of days 15:17:56 production is still running F23 and I'd really like to be able to show off dist-git task storage at devconf 15:19:15 jskladan: can you think of a way to get it done more quickly without being a really bad idea? 15:20:51 tflink: /me is thinking... 15:20:51 maybe a bit different question then: would it be _huge_ problem if, say for a day or two. the resultsdb appeared (almost) empty? 15:21:01 we don't really do gating yet 15:21:16 i don't think so 15:21:20 and the only real consumer is bodhi, as far as I know 15:21:42 as long as we have a week or so of results 15:22:44 so, how about (before the migration of data) I make a backup of the DB, and then prune the "live" data to the minimum 15:22:53 that would get migrated during the down-time 15:23:04 and then I'd feed the "older" data to it later on 15:23:12 how long would it take to get the older data fed into resultsdb? 15:23:43 wild guess is two/three days tops 15:24:03 that works for me 15:24:19 I might be missing something, but we don't we have the queue at the trigger level, and reply the incoming fedmsg after the upgrade? 15:24:22 less if we decide to scrap stuff that's "old" 15:24:32 we should probably check with bowlofeggs to make sure that bodhi isn't going to have horrible problems if we do that 15:24:36 but that IMO is not needed, really 15:24:46 kparal: IIRC, the db took a day to migrate for stg 15:24:53 and that's after it was trimmed 15:25:09 mkrizek: do you recall how long the migration took for stg? 15:25:16 8 hours or so 15:25:20 we would have a full day of jobs to be performed. but we need to do depcheck and upgradepath only once, actually 15:25:34 tflink: ad bodhi - from what I know about what they do, no data in resultsdb would be "this was not tested (yet)" equivalent 15:25:38 so it might not be that many jobs 15:26:07 i suspect that packagers expect faster feedback on their packages in bodhi 15:26:24 * kparal shrugs 15:26:28 it's a one time event 15:26:34 kparal: tflink: mkrizek: that is one of the things I'd like to sort out some time in the future (depcheck + upgradepath reporting gazilions of resutls) 15:26:38 I also have a suspicion that this isn't going to be the last time we have a migration issue like that 15:27:11 jskladan: do you think I'm worrying too much? 15:27:22 wrt somethign like this happening again 15:27:36 tflink: there is nothing like worrying too much, when we talk PROD, is ithere? :) 15:27:53 there's a balance :) 15:28:01 I think that 24 hour delay once a year is not a big deal 15:28:09 but I think that Bodhi is actually the least of our problems :) 15:28:12 there are so many other things that could be done with those 2-3 dev days of yours 15:28:13 we push to repos once a day anyway 15:28:46 tflink: honestly, I'm fairly sure that the way the migration is done is almost the worst thing performance-vise 15:29:31 but I don't understand the alembic + postgres good enough to make the right adjustments/force the right SQL calls 15:30:11 next time we do something like this (mass-migrating data) I'd love to have somebody more skilled to consult 15:30:28 jskladan: if you do spend 2-3 days on getting the older data back into resultsdb, do you have any thoughts on whether that work would be re-usable or if it would be needed again enough to justify making it reusable 15:31:17 tflink: no idea at the moment. But - my plan is to migrate the "backup" db (where we don't make it down-time, and we don't care how long it takes) 15:31:54 and then using pg_dump to move the migrated data from one db to another 15:32:18 so, the thing should be fairly easy 15:32:29 (yes, the "s" word) 15:32:30 how much of a disruption would it be if we were to take prod down for 8-10 hours starting at say ... 21:00 UTC? 15:32:57 other than the loss of sleep for those involved with the migration 15:33:31 I really have no idea here - I could try and dig some data from resultsdb 15:33:43 to make a graph of "when were results submitted" 15:34:00 jskladan: would that take more than 20 minutes? 15:34:06 guess not 15:34:30 could you get it done today? 15:35:12 absolutely 15:35:27 you think a month's worth of data is enough for that judgement call? 15:35:49 then I propose that we plan for an extended outage either tonight or tomorrow and skip the "reloading" stuff for now 15:35:59 jskladan: a month would be fine, I think 15:36:35 use the data that jskladan gets to inform us of a time when it would be least disruptive to Fedora 15:36:42 OK, I'll get it done 15:37:22 the more I think about it, the more I'd rather see us spend time on new features than spending days to get a once-in-years migration to go faster 15:37:56 +/- 1s? 15:38:05 +1 15:38:14 sure 15:38:55 +1 15:39:03 kparal: don't be too enthusiastic putting a +1 on what I think is your own proposal :-P 15:39:10 +1 15:39:20 I always +1 my own proposals 15:39:27 * tflink doesn't see any disagreement, so 15:40:08 modesty is overrated 15:40:20 #agreed for taskotron production redeployment, schedule a long outage for resultsdb migration and try to minimize disruption. this is a once-in-a-long-while migration and isn't worth spending days to make it faster 15:40:35 * tflink can undo if there are any objections 15:40:48 just figured that it would be faster than proposing something that seemed to have agreement :) 15:40:58 ack 15:41:07 post-ack 15:41:51 ok, moving on 15:41:53 #topic Task Dashboards 15:42:00 #link https://lists.fedoraproject.org/archives/list/qa-devel@lists.fedoraproject.org/message/CV2W5Q5VOTZTZSMLQPG5IYMU3MMUBZGG/ 15:42:18 hrm, we have no lbrabec today 15:42:45 the only thing I'm concerned about is having yet another project 15:42:58 that doesn't mean it's not a good idea 15:43:13 I'm not trying to downplay your concern but I'm not sure we have another option in this case 15:43:24 alternate suggestions welcome :) 15:44:05 I also want to emphasize that for the immediate future, I'm only looking for a prototype - not a finished product 15:44:20 it depends how important it is (for us or some other team) 15:44:23 I talked with lbrabec, and it seems that we should be able to have a "thing to show" this week 15:44:26 devconf is next week and I want to have something visual to show off for folks who aren't familiar wiht taskotron 15:45:00 most of the "hows" is figured out now, so we'll focus on a mockup tomorrow 15:45:26 hopefully be able to fill the mockup with real data on wednesday, but that's if we don't hit any unpredicted problems 15:46:08 with any luck, the prototype can be simple enough to avoid most of the conceptual problems - even if that means hard coding stuff and/or making some assumptions that won't work in the long run 15:47:34 any other comments/concerns/suggestions? 15:47:45 nope 15:49:28 I think that's it for the topics for today - I think the other two I sent out are not needed atm 15:49:38 moving on to 15:49:41 #topic tasking 15:49:47 is anyone in need of things to do? 15:50:22 * tflink doesn't think so but it's better to ask than assume :) 15:50:48 I'm good, I think 15:50:53 you wanted some help with docs building 15:51:20 I haven't any info about it yet, but I'm buries in email, so if I missed something, poke me 15:51:24 *seen 15:51:27 *buried 15:51:28 kparal: yeah, I was hoping that the doit.py changes in libtaskotron would be enough of an example 15:51:49 tflink: could you send me diff number? thanks 15:52:25 https://phab.qa.fedoraproject.org/D1012 15:52:31 thanks. also, what's the current plan with git move? 15:52:51 * kparal adjusted some old repo definitions in phab today, for the few projects that moved already 15:53:23 kparal: still pending, unfortunately. i got distracted last week 15:53:42 ok, no problem 15:54:17 kparal: thanks 15:54:28 for working on the docs stuff 15:54:43 any objections to me moving the repos today? 15:55:04 nope 15:55:35 * tflink will get that done 15:55:50 also, blockerbugs - fedorahosted is going to be going away before too much longer 15:56:07 but with 5 minutes left, it's time for ... 15:56:11 #topic Open Floor 15:56:40 Anything else that folks would like to see covered? 15:57:21 nothing here 15:58:36 * tflink sets the magical, non-deterministic fuse 15:59:14 thanks for coming, everyone 15:59:19 * tflink will send out minutes shortly 15:59:22 #endmeeting