15:00:56 <tflink> #startmeeting fedora-qadevel
15:00:56 <zodbot> Meeting started Mon Jan 16 15:00:56 2017 UTC.  The chair is tflink. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:56 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
15:00:56 <zodbot> The meeting name has been set to 'fedora-qadevel'
15:00:56 <tflink> #meetingname fedora-qadevel
15:00:56 <zodbot> The meeting name has been set to 'fedora-qadevel'
15:00:56 <tflink> #topic Roll Call
15:01:05 * mkrizek is here
15:01:09 * garretraziel is here
15:01:54 * jskladan lurks
15:02:22 * roshi is here
15:02:53 * kparal is here
15:03:06 <linuxmodder> .fas linuxmodder
15:03:07 <zodbot> linuxmodder: linuxmodder 'Corey W Sheldon' <sheldon.corey@openmailbox.org>
15:03:19 <linuxmodder> finishing up two other mtgs might be delayed to respond
15:03:43 <tflink> #chair mkrizek garretraziel jskladan robyduck kparal linuxmodder
15:03:43 <zodbot> Current chairs: garretraziel jskladan kparal linuxmodder mkrizek robyduck tflink
15:03:56 <tflink> let's get this party started
15:04:05 <tflink> #topic Announcements and Information
15:04:14 <tflink> #info taskotron stg rebuild complete - tflink, mkrizek
15:04:14 <tflink> #info libtaskotron 0.4.18 released and deployed to dev and stg - mkrizek
15:04:14 <tflink> #info fix for taskotron-trigger CLI posted for review - mkrizek
15:04:14 <tflink> #link https://phab.qa.fedoraproject.org/D1081
15:04:32 <tflink> any additional announcements?
15:05:47 <jskladan> #info started working on the dashboards - lbrabec, jskladan
15:07:19 * tflink assumes that there are no additional things
15:07:41 <tflink> so, moving on to ...
15:07:44 <tflink> #topic Rebuilding Taskotron Production
15:08:32 <tflink> we were able to get stg working last week but from the things we found, there are a few kinks to work out before rebuilding prod
15:08:56 <tflink> 1. even with truncating the db, migration still takes a long time
15:09:21 <tflink> 2. file migration also takes a long time and it turns out that moving buildmaster files to nfs isn't really an option for now
15:10:22 <mkrizek> re: 2., I'd go with stg setup - buildmaster on local storage, artifacts on nfs
15:10:23 <tflink> for 1. I'm thinking the following: increase the memory on the db host when the outage starts and look into a "log" function for resultsdb that can produce a csv-like file which can be replayed later
15:11:29 <tflink> my thought being that we could migrate a snapshot of the resultsdb db to the new schema offline and once we start the outage and get the new instance set up (with new db), we could "play back" the csv to get newer results in the db
15:11:40 <tflink> mkrizek: that sounds good to me
15:12:04 <tflink> jskladan: any thoughts on that plan for resultsdb db migration?
15:12:15 <jskladan> ad 1) The other thing I thought of was doing the migration "in background" - on a db copy, which would leave us with day(s) worth of data at most to migrate during the outage
15:12:42 <kparal> 👍
15:12:48 <jskladan> but log-to-file + replay for the downtime could be just as good
15:12:49 <tflink> jskladan: is there an easier way to do that than what I mentioned with the csv file?
15:13:16 * tflink wasn't sure if migrating data from old to new would be feasible
15:15:07 <jskladan> the thing with csv-backup + replay is, that we'd have to make sure everything is done "properly" on the csv layer - id's stuff like that
15:16:08 <jskladan> I'll have to look at what specific way we currently use resultsdb - i.e. how much back-and-forth there is between the client and server
15:16:43 <tflink> yeah, It'd require work but I figure that storing the incoming data and writing a quick script that uses resultsdb_api to submit results from csv would work well enough and if we make the assumption that only results will be submitted, it's easier to code both the csv bits and the "replayer"
15:16:44 <jskladan> and how complicated would mocking that up for the csv-store be
15:17:31 <tflink> I assert that we need to get the redeployment done in the next couple of days
15:17:56 <tflink> production is still running F23 and I'd really like to be able to show off dist-git task storage at devconf
15:19:15 <tflink> jskladan: can you think of a way to get it done more quickly without being a really bad idea?
15:20:51 <jskladan> tflink: /me is thinking...
15:20:51 <jskladan> maybe a bit different question then: would it be _huge_ problem if, say for a day or two. the resultsdb appeared (almost) empty?
15:21:01 <jskladan> we don't really do gating yet
15:21:16 <tflink> i don't think so
15:21:20 <jskladan> and the only real consumer is bodhi, as far as I know
15:21:42 <tflink> as long as we have a week or so of results
15:22:44 <jskladan> so, how about (before the migration of data) I make a backup of the DB, and then prune the "live" data to the minimum
15:22:53 <jskladan> that would get migrated during the down-time
15:23:04 <jskladan> and then I'd feed the "older" data to it later on
15:23:12 <tflink> how long would it take to get the older data fed into resultsdb?
15:23:43 <jskladan> wild guess is two/three days tops
15:24:03 <tflink> that works for me
15:24:19 <kparal> I might be missing something, but we don't we have the queue at the trigger level, and reply the incoming fedmsg after the upgrade?
15:24:22 <jskladan> less if we decide to scrap stuff that's "old"
15:24:32 <tflink> we should probably check with bowlofeggs to make sure that bodhi isn't going to have horrible problems if we do that
15:24:36 <jskladan> but that IMO is not needed, really
15:24:46 <tflink> kparal: IIRC, the db took a day to migrate for stg
15:24:53 <tflink> and that's after it was trimmed
15:25:09 <tflink> mkrizek: do you recall how long the migration took for stg?
15:25:16 <mkrizek> 8 hours or so
15:25:20 <kparal> we would have a full day of jobs to be performed. but we need to do depcheck and upgradepath only once, actually
15:25:34 <jskladan> tflink: ad bodhi - from what I know about what they do, no data in resultsdb would be "this was not tested (yet)" equivalent
15:25:38 <kparal> so it might not be that many jobs
15:26:07 <tflink> i suspect that packagers expect faster feedback on their packages in bodhi
15:26:24 * kparal shrugs
15:26:28 <kparal> it's a one time event
15:26:34 <jskladan> kparal: tflink: mkrizek: that is one of the things I'd like to sort out some time in the future (depcheck + upgradepath reporting gazilions of resutls)
15:26:38 <tflink> I also have a suspicion that this isn't going to be the last time we have a migration issue like that
15:27:11 <tflink> jskladan: do you think I'm worrying too much?
15:27:22 <tflink> wrt somethign like this happening again
15:27:36 <jskladan> tflink: there is nothing like worrying too much, when we talk PROD, is ithere? :)
15:27:53 <tflink> there's a balance :)
15:28:01 <kparal> I think that 24 hour delay once a year is not a big deal
15:28:09 <jskladan> but I think that Bodhi is actually the least of our problems :)
15:28:12 <tflink> there are so many other things that could be done with those 2-3 dev days of yours
15:28:13 <kparal> we push to repos once a day anyway
15:28:46 <jskladan> tflink: honestly, I'm fairly sure that the way the migration is done is almost the worst thing performance-vise
15:29:31 <jskladan> but I don't understand the alembic + postgres good enough to make the right adjustments/force the right SQL calls
15:30:11 <jskladan> next time we do something like this (mass-migrating data) I'd love to have somebody more skilled to consult
15:30:28 <tflink> jskladan: if you do spend 2-3 days on getting the older data back into resultsdb, do you have any thoughts on whether that work would be re-usable or if it would be needed again enough to justify making it reusable
15:31:17 <jskladan> tflink: no idea at the moment. But - my plan is to migrate the "backup" db (where we don't make it down-time, and we don't care how long it takes)
15:31:54 <jskladan> and then using pg_dump to move the migrated data from one db to another
15:32:18 <jskladan> so, the thing should be fairly easy
15:32:29 <jskladan> (yes, the "s" word)
15:32:30 <tflink> how much of a disruption would it be if we were to take prod down for 8-10 hours starting at say ... 21:00 UTC?
15:32:57 <tflink> other than the loss of sleep for those involved with the migration
15:33:31 <jskladan> I really have no idea here - I could try and dig some data from resultsdb
15:33:43 <jskladan> to make a graph of "when were results submitted"
15:34:00 <tflink> jskladan: would that take more than 20 minutes?
15:34:06 <jskladan> guess not
15:34:30 <tflink> could you get it done today?
15:35:12 <jskladan> absolutely
15:35:27 <jskladan> you think a month's worth of data is enough for that judgement call?
15:35:49 <tflink> then I propose that we plan for an extended outage either tonight or tomorrow and skip the "reloading" stuff for now
15:35:59 <tflink> jskladan: a month would be fine, I think
15:36:35 <tflink> use the data that jskladan gets to inform us of a time when it would be least disruptive to Fedora
15:36:42 <jskladan> OK, I'll get it done
15:37:22 <tflink> the more I think about it, the more I'd rather see us spend time on new features than spending days to get a once-in-years migration to go faster
15:37:56 <tflink> +/- 1s?
15:38:05 <mkrizek> +1
15:38:14 <kparal> sure
15:38:55 <garretraziel> +1
15:39:03 <tflink> kparal: don't be too enthusiastic putting a +1 on what I think is your own proposal :-P
15:39:10 <jskladan> +1
15:39:20 <kparal> I always +1 my own proposals
15:39:27 * tflink doesn't see any disagreement, so
15:40:08 <kparal> modesty is overrated
15:40:20 <tflink> #agreed for taskotron production redeployment, schedule a long outage for resultsdb migration and try to minimize disruption. this is a once-in-a-long-while migration and isn't worth spending days to make it faster
15:40:35 * tflink can undo if there are any objections
15:40:48 <tflink> just figured that it would be faster than proposing something that seemed to have agreement :)
15:40:58 <mkrizek> ack
15:41:07 <kparal> post-ack
15:41:51 <tflink> ok, moving on
15:41:53 <tflink> #topic Task Dashboards
15:42:00 <tflink> #link https://lists.fedoraproject.org/archives/list/qa-devel@lists.fedoraproject.org/message/CV2W5Q5VOTZTZSMLQPG5IYMU3MMUBZGG/
15:42:18 <tflink> hrm, we have no lbrabec today
15:42:45 <kparal> the only thing I'm concerned about is having yet another project
15:42:58 <kparal> that doesn't mean it's not a good idea
15:43:13 <tflink> I'm not trying to downplay your concern but I'm not sure we have another option in this case
15:43:24 <tflink> alternate suggestions welcome :)
15:44:05 <tflink> I also want to emphasize that for the immediate future, I'm only looking for a prototype - not a finished product
15:44:20 <kparal> it depends how important it is (for us or some other team)
15:44:23 <jskladan> I talked with lbrabec, and it seems that we should be able to have a "thing to show" this week
15:44:26 <tflink> devconf is next week and I want to have something visual to show off for folks who aren't familiar wiht taskotron
15:45:00 <jskladan> most of the "hows" is figured out now, so we'll focus on a mockup tomorrow
15:45:26 <jskladan> hopefully be able to fill the mockup with real data on wednesday, but that's if we don't hit any unpredicted problems
15:46:08 <tflink> with any luck, the prototype can be simple enough to avoid most of the conceptual problems - even if that means hard coding stuff and/or making some assumptions that won't work in the long run
15:47:34 <tflink> any other comments/concerns/suggestions?
15:47:45 <jskladan> nope
15:49:28 <tflink> I think that's it for the topics for today - I think the other two I sent out are not needed atm
15:49:38 <tflink> moving on to
15:49:41 <tflink> #topic tasking
15:49:47 <tflink> is anyone in need of things to do?
15:50:22 * tflink doesn't think so but it's better to ask than assume :)
15:50:48 <jskladan> I'm good, I think
15:50:53 <kparal> you wanted some help with docs building
15:51:20 <kparal> I haven't any info about it yet, but I'm buries in email, so if I missed something, poke me
15:51:24 <kparal> *seen
15:51:27 <kparal> *buried
15:51:28 <tflink> kparal: yeah, I was hoping that the doit.py changes in libtaskotron would be enough of an example
15:51:49 <kparal> tflink: could you send me diff number? thanks
15:52:25 <tflink> https://phab.qa.fedoraproject.org/D1012
15:52:31 <kparal> thanks. also, what's the current plan with git move?
15:52:51 * kparal adjusted some old repo definitions in phab today, for the few projects that moved already
15:53:23 <tflink> kparal: still pending, unfortunately. i got distracted last week
15:53:42 <kparal> ok, no problem
15:54:17 <tflink> kparal: thanks
15:54:28 <tflink> for working on the docs stuff
15:54:43 <tflink> any objections to me moving the repos today?
15:55:04 <kparal> nope
15:55:35 * tflink will get that done
15:55:50 <tflink> also, blockerbugs - fedorahosted is going to be going away before too much longer
15:56:07 <tflink> but with 5 minutes left, it's time for ...
15:56:11 <tflink> #topic Open Floor
15:56:40 <tflink> Anything else that folks would like to see covered?
15:57:21 <kparal> nothing here
15:58:36 * tflink sets the magical, non-deterministic fuse
15:59:14 <tflink> thanks for coming, everyone
15:59:19 * tflink will send out minutes shortly
15:59:22 <tflink> #endmeeting