17:05:32 #startmeeting Discussion of Koji/Rawhide automated testing plans 17:05:32 Meeting started Mon Jul 1 17:05:32 2013 UTC. The chair is sgallagh. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:05:32 Useful Commands: #action #agreed #halp #info #idea #link #topic. 17:05:46 #meetingname Discussion of Koji/Rawhide automated testing plans 17:05:46 The meeting name has been set to 'discussion_of_koji/rawhide_automated_testing_plans' 17:05:59 sgallagh: you might want to do a #chair? 17:06:17 puiterwijk: As soon as the mikes join, I will 17:06:23 I se 17:06:26 see even 17:06:37 * nirik is here as well, but also doing other stuff. 17:07:56 #chair mbonnet mikem23 puiterwijk sgallagh 17:07:56 Current chairs: mbonnet mikem23 puiterwijk sgallagh 17:08:15 #topic Overview/Goals 17:08:47 Ok, so first a little context. 17:09:05 After FUDCon Lawrence, we came to Fedora with some ideas for making Rawhide more usable. 17:09:50 Specifically, we had some ideas about producing tests that could "gate" package drops into the public Rawhide repo, so if they impacted any functionality, the repocreate wouldn't happen 17:09:51 ah, I was wondering what meet on freenode met :) 17:10:21 #chair mbonnet mikem23 puiterwijk sgallagh tflink 17:10:21 Current chairs: mbonnet mikem23 puiterwijk sgallagh tflink 17:10:22 * nirik likes the idea in principal, but really hopes we can do things so it doesn't slow rawhide down. 17:10:28 #chair mbonnet mikem23 puiterwijk sgallagh tflink nirik 17:10:28 Current chairs: mbonnet mikem23 nirik puiterwijk sgallagh tflink 17:10:54 nirik: Well, I'm not sure we'll define that as a goal. 17:11:03 Certainly, I'd like not to *cripple* Rawhide 17:11:13 But right now, it has a habit of running very fast and tripping 17:11:16 I think it would slow rawhide down by definition, no? 17:11:17 well, I think it's something to keep in mind. 17:11:29 I don't think we should say 'no slowdown' 17:11:53 Well, it depends on the tests 17:11:56 * nirik doesn't think rawhide trips all that much, but it could be better for sure. ;) 17:12:26 nirik: Well, the long-term goal here would be that we should reasonably be able to expect that people working on building stuff for Fedora should be running Rawhide. 17:12:34 Which is far from the truth today. 17:13:18 Anyway, the idea of gating Rawhide on a set of minimally-available functionality was more or less agreed to be desirable, but we didn't have any resources to spend on it at the time. 17:13:19 sure, but I think some of that is just misperception that rawhide is broken all the time, which isn't really the case as much as it used to be 17:13:55 nirik: That may be partially true, but one of the easiest ways to change perception would be to implement a plan to prevent the perception from being reality. 17:14:09 Then it can at least be demonstrated to have been "yesterday's news" 17:14:26 * nirik has been running rawhide full time on his main laptop for 7 months now. 17:14:29 Anyway, we now have an available resource, at least for the next two months: puiterwijk 17:14:36 * nirik cheers. 17:14:53 wow, I'm seen as a resource... not sure if I should be too happy about that :-) 17:15:47 * dgilmore is kinda here 17:15:49 BTW, did folks have a chance to read my reply to the invite? 17:16:08 Essentially, I had hired puiterwijk as an intern and he's already finished what I had scheduled for him, so I got approval to move him onto this project. 17:16:48 mikem23: Yeah, I did. I'm glad to hear that it may be less work than we thought. 17:17:20 sgallagh: not really, unless I'm severely misunderstanding something, there are several parts missing to the plan 17:17:23 * nirik is unsure of what exactly is being proposed. 17:17:28 Before we move into actual execution plans, I just wanted to confirm that everyone understands what we're trying to do. 17:17:32 Well, it'd not necessarily less work, but little to none of it needs to land in Koji proper 17:17:40 Which, clearly, is not the case. 17:17:50 I don't. Is this written up somewhere? and got buyin from fesco/releng/qa? 17:18:09 I mean I recall the discussions at fudcon, but that wasn't very detailed. 17:18:16 nirik: There was a long thread on fedora-devel back in the February/March timeframe 17:18:23 Let's talk about the current state of things. How things get into Rawhide 17:18:36 And no, we don't have a full writeup. That's part of the intended outcome of this discussion 17:18:45 yeah, but my memory of it was "this would be nice to do something, lets come up with a plan and come back and tell everyone what we want to implement" 17:18:50 ok, great. 17:19:11 sgallagh: so kinda what you want is to have things not land and go out in rawhide unless its passed some kind of automated qa? 17:19:15 Specifically, puiterwijk's first task in the next week or so is going to be putting together a proposal and design document 17:19:24 dgilmore: In a nutshell, yes. 17:20:07 puiterwijk's job would be to get that framework in place, not to write the initial tests (other than some placeholders to test his work, of course) 17:20:11 sgallagh: ok, thats not too hard to do and doesn't involve any koji changes 17:20:32 sgallagh: we could have builds tag into f20-candidate 17:20:34 right, just change things to land in f20-pending or whatever 17:20:38 which would trigger the qa 17:20:40 dgilmore: Right, which I didn't know when I called this meeting. Let me forward mikem23's email response 17:20:47 when that passes we move them to f20 17:20:50 however, what does that mean for the buildroot? 17:21:17 nirik: i guess it would mean we need buildroot override capabilities for rawhide 17:21:25 nirik: I was going to get to that. I'd like to see this address buildroot concerns as well 17:21:31 yeah, and chainbuilds no longer work. 17:21:38 which would be unhappy for some people. 17:21:40 I've had a couple cases in the last few months where bad packages in the buildroot caused issues 17:21:40 since until something passes qa it wouldn't go to the buildroot 17:21:47 dgilmore, right, then the work becomes building an automated system for moving stuff from f20-candidate (or whatever suffix we like) into f20 17:21:54 nirik: would need a wrapper of some kind 17:22:03 mikem23: right 17:22:13 well, if the tests were quick enough and automated, I guess it could still work. 17:22:19 just be slower. 17:22:30 nirik: Well, as far as chain-builds, now that bodhi can create overrides, I think we can modify the chain-build procedure to land those. Then it will work on *any* branch too. 17:22:32 nirik: always possible yeah 17:22:47 nirik: well, if we would do it with every build, it might trip on some things that also need another package updated 17:22:56 for chain-builds, we could also create an alternate buildroot/target 17:23:38 puiterwijk: yeah, some tests are just not going to be easy at all. 17:23:48 i.e. one that include the f20-candidate builds 17:24:03 mikem23: thats also an option 17:24:21 or we could just leave buildroot for now and keep populating it from the candidates. 17:24:49 right, but not the rawhide compose 17:25:06 lots of options 17:26:39 yeah, thats part of the problem... there's a lot of ways to hook things in, so it's hard to say whats best here. 17:27:10 and if we want to get fancy, there could be multiple stages of validation, with stage1 getting it into the buildroot and stage2 getting it into the compose. ....but perhaps it is best to start with a simple perturbation of what we have 17:28:27 So, what are the main issues we're trying to address. What kind of rawhide breakage are we most worried about? 17:28:50 seems like at least a repoclosure would be good before putting things in the chroot 17:28:53 well, broken deps are anoying... but thats a pretty difficult problem to tackle. 17:28:55 Personally, I'm in favor of creating the alternate buildroot target as well, since if we introduce a broken dependency (especially something like glibc or gcc), I'd rather that we avoid breaking EVERYTHING 17:29:06 mbonnet ++ 17:30:08 mikem23: Well, when we first discussed this, the goal was to base it on functional tests (rather than API tests, etc.) 17:30:34 I.e. the most basic test would be: could we create a VM image that boots to a usable system with SSH access? 17:31:03 If the answer is no, then something SERIOUSLY bad went into this build, and it needs to be blocked. 17:31:16 sgallagh: I think we might want to separate issues into "things that prevent people from being able to build" and "things that will cause runtime breakage" 17:31:18 And then expand on that in *priority* order. 17:31:31 mbonnet: That's fair. 17:31:50 how do we want to handle groups of updates? or does this operate only on one build at a time? 17:31:54 I'm sort of looking to ensure that someone could use Rawhide as a rolling release distro if they were so inclined. 17:32:00 sgallagh: issues in the first category should be checked before letting the package into the buildroot. Things in the second category could be checked later and trigger an untag of the offending build. 17:32:16 nirik: Well, right now the repocreate runs periodically, right? 17:32:22 I'd tie it to that I think 17:32:23 nirik: if we're talking about dependency trees, they can't be one build at a time 17:32:24 nirik, with rawhide, we don't really have natural grouping of updates 17:32:35 right, so we run into problems. 17:32:41 libfoo builds and is a abi bump 17:32:42 mbonnet: Well, I'd want it to be checked *before* it became public. 17:32:53 it goes to test and fails because it causes 20 broken deps. 17:33:03 Like I said: that way someone doing bleeding-edge development or DevOps could use Rawhide for their base platform 17:33:08 how could we rebuild those 20 packages without it being available. 17:33:20 sgallagh: I think any kind of functional testing is going to take time and resources, and blocking buildroot updates on that may create unreasonable delays, especially as the library of tests that get run increases. 17:33:23 nirik, that would be a job for the alternate buildroot 17:33:24 if we do push it into buildroot each one of those packages by itself would fail until libfoo passes. 17:33:52 each new buildroot would mean another newrepo task. ;( we already have a lot. 17:34:02 mbonnet: Yeah, I'm okay with having two levels of checks to address that. 17:34:10 nirik, just two buildroots up from our one 17:34:16 mbonnet: I was trying to express my desires, not an implementation :) 17:34:27 ah, so one for 'all pending stuff' 17:34:35 sgallagh: sure 17:34:52 yep, which is effectively the one we have now 17:35:09 * nirik nods 17:36:34 * nirik will have to think about the flow some here... 17:37:30 as to building a vm on each buildroot, not sure how easy that would be... lots of moving parts there. 17:37:56 and it would take quite a bit of time 17:38:06 we could duct tape something with fedmsg / fire compose / upload to cloud / run / test. But I don't see it being easy 17:38:55 Sadly, "easy" and "useful" seem to be a continuum. 17:39:13 are there non vm type tests we could start with? or you think a vm is the base to build on? 17:39:53 nirik: Well, above we were talking about having two types of tests. 17:39:53 I think if the complexity of the tests starts to balloon, you'll probably need to get into the idea I mentioned earlier, multiple levels of gating 17:40:16 One for blocking the buildroot, another for the public yum repo 17:40:18 yeah, so start with builtroot ones? 17:40:20 So the thing is, rawhide is rawhide 17:40:30 we're talking about 'taming' rawhide to some extent 17:40:32 I think we can start with the buildroot ones for maximum short-term gain 17:40:46 IMHO, rawhide is much less raw than it used to be. 17:40:54 * nirik has been working to make it so. 17:40:55 * mattdm parachutes into the conversation 17:41:06 * mattdm wishes there were anaconda for rawhide 17:41:22 mikem23: Well real or perceived instability of Rawhide tends to lead to people not actually testing their changes there, but just pushing them to Rawhide as part of the process of backporting them to stable releases 17:41:24 nirik, fair enough, but what makes it that way. Is there an effective gating process elsewhere? 17:41:24 once you get to implemenation, there's a whole dimension of this problem that I haven't seen discussed: gating and overriding results 17:42:04 if that's part of a later conversation, that fine but it's not a trivial problem 17:42:06 mikem23: nope... aside from people finding breakage and fixing it quicking, stressing to maintainers to test changes before building, etc. 17:42:35 tflink: I think that's more for later, as right now we're trying to get the ideas/goals clear 17:42:42 http://fedoraproject.org/wiki/Releases/Rawhide#Audience 17:43:23 tflink: I agree, it's something we need to keep in mind. But right now I think we're still trying to figure out *where* things go, then we'll get on to how we implement them 17:43:28 buildroot does break from time to time, but thats usually noticed the day of. 17:43:31 puiterwijk: ok, just wanting to make sure that expectations aren't too high 17:44:00 * tflink isn't clear how you can write a proposal without discussing all the required moving parts, though 17:44:08 and a 'untag' gets things working again (if it's the base buildsys group) 17:44:34 on the thing a little bit ago: rather than composing a whole new vm, tests could be taking a nightly minimal snapshot and running yum install of the new package on that.... 17:44:44 tflink: I didn't say for a later conversation, just not at this specific time in the conversation, I think;) 17:44:50 nirik: untag is still unpleasant to people who picked the package up on their system 17:45:10 sgallagh: they wouldn't have... it's only just been built. Unless they manually downloaded it and updated with it 17:45:44 for example, filesystem broke a while back. 17:45:44 nirik: Right now, anything built in Rawhide ends up in the public repo unless it's caught and untagged before the repocreate run 17:45:51 Or am I mistaken? 17:45:53 it was noticed the next newrepo after it landed. 17:46:04 yes, once a day when it's composed. 17:46:15 if you build something it lands in the buildroot next newrepo. 17:46:23 it's composed and pushed out once a day later. 17:46:41 ah 17:46:49 Ok, so my understanding there was off 17:46:51 (so, isn't the "gating" propsal about running some quick tests before that happens?) 17:46:56 from the time it's built to the time the compose starts it could be untagged and never go out in the compose. 17:47:04 right. 17:47:38 so, another option here: 17:47:53 OTOH, It could also be untagged shortly after the compose starts and still get in. 17:47:55 could do a compose with everything, run tests, and on failure bisect until problem is found 17:47:57 Ok, so if the compose is only happening once a day, that might well be the place to run the functional tests, at least. 17:48:08 have buildroot checks, and have compose checks... the compose checks just run before/as the compose 17:48:21 nirik: +1 17:48:29 yeah, but then do we just fail the compose? or try and prune things that are causing problems? 17:48:42 how long do composes take? I'm not sure we have time to bisect 17:48:43 mikem23: yep. very true. 17:48:46 first pass, fail the compose 17:48:52 second pass, be more clever. 17:48:54 * nirik looks 17:49:17 3.5 hours or so 17:49:21 depending on the tests and the output we get we can probably be smarter than just bisecting 17:49:37 heh, yeah, no bisect with composes 17:49:44 what is the bottleneck in that 3.5 hours? 17:49:51 much of thats deltarpms and such I suspect. 17:50:05 also just the sheer number of packages. 17:50:08 http://kojipkgs.fedoraproject.org/mash/rawhide-20130701/logs/ 17:50:10 can some of that be delayed and done only after the successful compose? 17:50:13 lots and lots of io 17:50:28 Good point: deltarpms should probably only be generated if the tests pass 17:50:33 right, but that might be a significant restructure of the compose process 17:50:39 * mikem23 is not sure 17:50:50 Probably worth looking at that. 17:51:19 yeah... worth seeing it we can just add tests early, then untag things that are busted 17:51:46 sgallagh: the slowest part of the nightly rawhide compose is deltarpm generation 17:52:08 Or we could just run our own 'ultralight' compose for testing purposes and only do the full rawhide compose from the gated tag 17:52:23 here's a question: is it useful to do the gating for packages in the "base design" (whatever that ends up being)? 17:52:26 dgilmore: Ok, so then if we reordered things so that deltarpms are all generated *after* all of the other packages are composed, then we can run the tests on the full packages first. 17:52:31 mashing a repo without deltarpm is much much quicker 17:52:36 And abort if they fail *before* wasting all the deltarpm time 17:52:40 are we assuming that new tests would have to be created? 17:53:02 mikem23: That could work 17:53:29 * nirik likes the idea of it just being part of the compose, so there's no timing issues. 17:53:38 dgilmore: How much longer does a create+delta take vs. a create? 17:53:39 mikem23 what is ultralight compose? 17:53:41 2x? 10x? 17:53:47 sgallagh: its doable, we could mash without deltas and write some new process to generate delta rpms 17:54:06 if we do things seperately we need to make sure its done before compose starts... if it's part of compose it can natually just go on after tests. 17:54:10 sgallagh: i think mashing without deltas is about 15 minutes 17:54:29 dgilmore: And with it is 3.5 hours? 17:54:47 sgallagh: ~ yeah, depends on what exactly changes in the tree 17:54:58 mattdm, a compose with all the extra stuff not needed for first line tests turned off. notably deltarpms. probably other stuff as well 17:55:13 not sure how fast we can get it down to 17:55:22 mikem23 but still of the entire tree. got it. 17:56:23 mikem23: Well, if all the new stuff is in a side-tag anyway, can we just create that as a temporary repo and point our tests to the existing buildroot+the temp repo? Creating the temp repo would be practically instantaneous (except for mass rebuilds) 17:57:46 sgallagh, yes, you could do testing base on normal rawhide compose + pending updates via yum. However this would not catch installer issues and possibly others 17:57:56 still, probably worth considering 17:58:13 i think it's okay for installer testing to be separate 17:58:25 there's a project to do automatic testing of the installer anyway. 17:58:41 mattdm: which project? 17:59:22 would still need a "no deltas" mash of the pending tag to update from (to get multilib right). 17:59:37 tflink I don't remember details I just remember anaconda team being interested 17:59:59 mattdm: ok, I'm aware of at least 2 that are targeted @ anaconda 18:00:11 I suspect you're talking about dogtail, though 18:00:39 mikem23: rawhide is never installable 18:00:50 so we cant do installer testing anyway 18:01:47 unless anaconda folks are ok with us making it installable again. :) 18:01:50 dgilmore Rawhide is _not currently_ installable. It used to be, of course. Making it so again would be very useful. 18:02:07 "installable" can mean different things, too. 18:02:12 mattdm: it was turned off on purpose as part of no frozen rawhide 18:02:20 "Anaconda doesn't work" is different from "an AMI image can be generated" 18:02:30 I think automatic testing of anaconda and better gating of rawhide are on a course for making it work again. 18:02:46 well, it was disabled as part of no frozen rawhide by request of anaconda folks. 18:02:49 sgallagh Except, I think we eventually need to converge on AMI generation using anaconda as well. 18:03:02 they were getting a flood of bug reports for things when they were not finished landing them. 18:03:06 which just made more work for them 18:03:35 to make a change there anaconda folks would need to agree 18:03:45 absolutely 18:03:47 so, they would build a new anaconda with partial support for something that they were working on and people would file bugs on it they would need to close a 'hang on, we are working on it' all the time 18:04:24 so, yeah, perhaps we could change perceptions or get someone to triage their rawhide bugs or something. 18:04:37 or work on a branch and check things in when they work? 18:05:09 yeah, they also got things like 'why is there no new version to fix foo', etc. 18:06:34 I don't see why anaconda is special in that particular regard. 18:06:50 they also may be more receptive now that the big re-write has happened. 18:07:08 anyhow, something to talk with them about 18:07:45 mattdm: Because they were literally the first thing anyone saw when trying to install Rawhide 18:08:44 sgallagh So they were getting a lot of bug reports related to things _in general_ which were broken in rawhide and not really anaconda related? 18:08:57 gating may change how the anaconda team feels about installable rawhide 18:09:06 it seems like _this_ proposal directly addresses that 18:09:16 mikem23 exactly -- i hope so 18:09:29 mattdm: Yeah, I suspect that this is at least a partial solution to that problem 18:09:34 (not just change how they feel but also really address the underlying concerns) 18:09:39 But we should probably ask dcantrell/dlehman about it 18:09:51 * mattdm nods 18:10:25 * sgallagh notes that we're past the one-hour mark. I'm free to continue, but I understand if others need to reconvene. 18:10:55 I should head out 18:11:10 Would love to see a write up of the meeting and/or a proposal 18:12:22 Well, I've seen a lot of ideas, but no concrete proposals yet. 18:12:24 yeah, I think we at least have some agreement on where to hook into? 18:12:41 Ok, perhaps I missed it. 18:13:25 mikem23: Also, I've got zodbot recording this, so I'll send the record to all of the participans when we close it out 18:14:03 nirik: Would you mind summarizing where you think the hooks are? 18:14:20 I got that we probably wanted to do something during the compose, but I missed where the buildroot hooks would be 18:14:45 So, I think we need two places... buildroot checks and compose checks 18:14:51 * sgallagh nods 18:14:56 the compose checks could just be added to the compose process. 18:15:06 the buildroot we could look at hooking into fedmsg for... 18:15:23 or even just run something peroidically... 18:15:26 nirik: With the caveat that we may want to split out the repocreate and deltarpm generation to save time 18:15:46 nirik: except that fedmsg is based on an "as available" service, and there's no promise if messages even arrive at all, let alone in which timeframe, AFAIK? 18:16:00 true. 18:16:11 sgallagh: well, or just move the deltarpm creation to the back, and have it just bail out before that if tests fail, right? 18:16:12 True, if we ran it periodically, all we'd really need to do is email all of the owners of packages that built something between checks, so they could see if they caused it or were affected by it 18:16:27 so, the buildroot checking would just need to look at anything tagged into the buildroot, run checks on them and then move them to the 'normal' buildroot. 18:16:35 puiterwijk: That's what I was trying to say. Do the create, then run tests, then deltas if the tests pass. 18:16:40 or if they failed, mail the maintainer. 18:16:55 sgallagh: I see, your message seemed to indicate that they were to get two different processes 18:17:01 sgallagh: not sure that would work so well - we've had issues with signal-to-noise ratio before with automated checks 18:17:30 tflink: Explain? 18:17:54 My recommendation initially is that we limit ourselves to critical issues. 18:18:04 i.e. gcc won't produce usable binaries. 18:18:17 autoqa used to be a lot more spammy, sending out a lot of emails. we got dev pushback and AFAIK, some people started routing autoqa mail to /dev/null 18:18:37 my concern was more with "emailing everyone who built" on check failure 18:18:48 in this case, the packages won't land until the problem is fixed, right? 18:18:53 Sure, but between the checks, it should be a short list 18:18:55 mattdm: right 18:18:59 well, for buildroot I htink a good first cut would be to run a basic build with it and fail if it can't install the base buildroot in mock 18:19:18 nirik: That's a good place to start. 18:19:40 I guess it would need to untag in that case... so the fix could be built. 18:19:40 I might also suggest having it run at least a trivial compile (or an autoconf suite) 18:19:50 but untagging in every case wouldn't be a good idea. 18:20:21 sgallagh: we could, might be tricky to identify the bad build tho, no? 18:20:38 if gcc, glibc, and libfoobar all landed at once... 18:20:40 tflink: Perhaps: "Mail everyone whose package included a BuildRequires on the failed package?{ 18:20:54 nirik: True 18:21:21 we could also do different things for critpath/non critpath 18:21:28 Which was why my initial suggestion was to just treat the packages as a set and notify everyone that had built in that 20-minute period 18:21:37 nirik +1 critpath differentiation 18:21:37 sgallagh: I might not be understanding the types of checks being proposed, but I don't see why it can't be limited to the candidates for which build(s) broke stuff 18:22:04 tflink: That's what I said: only the owners of packages that are currently trying to enter the buildroot. 18:22:06 or I could be misunderstanding the scope/implementation 18:22:14 Not all maintainers in Fedora. 18:22:17 That's horrifying 18:22:51 it sounds like I'm really not understanding what's being proposed - I'll start re-reading the backscroll 18:23:11 so foo bar and baz are build and are in a newrepo... the tests run after that and see that mock cannot init a buildroot, so it untags all three and mails the maintainers "hey, one of these 3 packages broke the build root, if it's not you, retag, if it is, fix it" 18:23:40 nirik: That's pretty much what I was trying to say, but more elegantly put. 18:24:05 I think that's fine as a first pass and then a further refinement can try to identify which package 18:24:28 the check is just for composability, then? 18:24:31 I think init built root in mock is a good first goal, yeah... then we could make it build some test package or something, etc 18:24:36 nothing more fine grained than that? 18:24:54 tflink: its for 'the base buildroot... ie, the buildsys-build group can be installed in mock' 18:25:03 if thats not true, 0 builds will happen after that. 18:25:31 then we do more fine-grained tests against an image built from the compose, right? 18:25:42 tflink: That's one test, not the exhaustive set. As nirik says, we may also add a test for "does this (set of) package(s) compile in this buildroot" 18:25:46 so, this just tests the packages in that base buildsys-build that you can do a 'yum install buildsys-build' or whatever and have it work. 18:27:17 yeah, so I think the division would be: 18:27:31 buildroot test -> check that we can build things sanely. 18:27:46 compose test -> check the stuff we built that as a collection it is sane. 18:28:31 * nirik has to step away for a few minutes. 18:35:36 Sorry, had to disappear for a few minutes. Tornado warning in my area. 18:35:56 I'm going to close out this meeting and try to schedule a continuation tomorrow. I'll send out the recorded minutes shortly. 18:36:57 #endmeeting