17:00:07 #startmeeting F25 Alpha Go/No-Go meeting 17:00:07 Meeting started Thu Aug 18 17:00:07 2016 UTC. The chair is jkurik. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:07 Useful Commands: #action #agreed #halp #info #idea #link #topic. 17:00:07 The meeting name has been set to 'f25_alpha_go/no-go_meeting' 17:00:11 #meetingname F25-Alpha-Go_No_Go-meeting 17:00:11 The meeting name has been set to 'f25-alpha-go_no_go-meeting' 17:00:18 #topic Roll Call 17:00:24 .hello jkurik 17:00:25 jkurik: jkurik 'Jan Kurik' 17:00:29 .hello coremodule 17:00:30 coremodule: coremodule 'Geoffrey Marr' 17:00:32 .hello kevin 17:00:33 nirik: kevin 'Kevin Fenzi' 17:00:50 .hello pfrields 17:00:51 stickster: pfrields 'Paul W. Frields' 17:01:14 coremodule, nirik, stickster: hi and thanks for comming 17:01:23 Glad to be here! 17:01:59 adamw: are you with us as well ? 17:02:00 .hello jforbes 17:02:00 jforbes: jforbes 'Justin M. Forbes' 17:02:14 jforbes: hi Justin 17:02:36 * kparal lurks in case adamw is not around 17:02:49 holaamigos 17:03:06 kparal: he said he would be back 17:03:12 dgilmore: hola 17:03:25 * nirik doesn't think this meeting needs to be too long sadly. 17:03:27 #chair nirik stickster dgilmore adamw 17:03:27 Current chairs: adamw dgilmore jkurik nirik stickster 17:03:46 #topic Purpose of this meeting 17:03:47 nirik: there is some blockers to review 17:03:48 #info Purpose of this meeting is to check whether or not F25 Alpha is ready for shipment, according to the release criteria. 17:03:50 #info This is determined in a few ways: 17:03:51 #info No remaining blocker bugs 17:03:53 #info Release candidate compose is available 17:03:54 #info Test matrices for Alpha are fully completed 17:04:04 #link https://qa.fedoraproject.org/blockerbugs/milestone/25/alpha/buglist 17:04:06 #link https://fedoraproject.org/wiki/Test_Results:Fedora_25_Branched_20160807.n.0_Summary 17:04:19 hopefully adamw will be back soon 17:04:26 #topic Current status 17:04:35 #info We do not have so called RC for the Fedora 25 Alpha release due to https://bugzilla.redhat.com/show_bug.cgi?id=1365661 17:04:51 I would suggest to continue with the Mini-blocker review anyway and then skip Test Matrices coverage. 17:04:52 That kind of ices it 17:05:04 jkurik: and some other bugs 17:05:10 dgilmore: right 17:05:32 https://bugzilla.redhat.com/show_bug.cgi?id=1315541 being one of them 17:06:13 #info there are also other blockers line https://bugzilla.redhat.com/show_bug.cgi?id=1315541 17:06:30 #link #undo 17:06:31 jkurik: that one needs to be reviewed 17:06:42 I reproposed it 17:06:51 #undo 17:06:51 Removing item from minutes: 17:06:55 #undo 17:06:55 Removing item from minutes: INFO by jkurik at 17:06:13 : there are also other blockers line https://bugzilla.redhat.com/show_bug.cgi?id=1315541 17:07:02 #info there are also other blockers like https://bugzilla.redhat.com/show_bug.cgi?id=1315541 17:07:34 jkurik: its caused branched tofail 17:07:38 hi, sorry, just got back 17:07:40 the cloud compose is the real showstopper, the other accepted blockers are addressed. but yeah, no way around the cloud one. 17:07:53 hi adamw 17:08:15 adamw: may I ask you please to lead mini-blocker review ? 17:08:25 sure. 17:08:29 adamw: you already have the chair :) 17:08:39 #topic (1365917) kernel panic at boot - x2apic_cluster_probe+0x33/0x70 17:08:41 #link https://bugzilla.redhat.com/show_bug.cgi?id=1365917 17:08:43 #info Proposed Blocker, kernel, POST 17:09:29 so this is a boot fail on...some hardware. we're a bit hand-wavy on how many systems have x2apic enabled out of the box. 17:09:37 we have a fix for it, we also have a documentable workaround. 17:09:44 i'm probably -1 blocker / +1 FE, overall. 17:10:02 -1blocker/+1FE also 17:10:05 Sorry.You are very skilled we will have UEFI Fedora boot into future ? 17:10:07 I agree +1 FE is OK 17:11:30 Well, it has been fixed for almost a week now, so any new kernel build will include the fix, but is it worth going through the test matrix on a new kernel? 17:11:46 mythcat: Fedora has had UEFI support for many years. ;) but not sure what you are asking there.... 17:11:50 mythcat: is your question somehow related to the bug #1365917 ? 17:11:52 -1 blocker +1 FE 17:11:59 jforbes: we're going to be delayed a week anyway, i'm fine with a new kernel. 17:12:07 -1 blocker +1 FE 17:12:19 Cool 17:12:38 -1 blocker, +1 FE 17:12:56 proposed #agreed 1365917 - RejectedBlocker AcceptedFreezeException - on our best guess as to how much hardware would be affected, and because there's a fairly easy workaround, we don't think this is quite broad enough to block Alpha, but definitely severe enough to warrant a freeze exception 17:12:57 jkurik: no , sorry about that 17:13:22 adamw: ack 17:13:30 ack 17:13:31 ack 17:13:37 ack 17:13:39 ack 17:13:46 ack 17:13:56 ack 17:14:43 ack 17:14:44 #agreed 1365917 - RejectedBlocker AcceptedFreezeException - on our best guess as to how much hardware would be affected, and because there's a fairly easy workaround, we don't think this is quite broad enough to block Alpha, but definitely severe enough to warrant a freeze exception 17:14:53 #topic (1367321) system reboots 1 second after selecting a kernel in grub 17:14:53 #link https://bugzilla.redhat.com/show_bug.cgi?id=1367321 17:14:53 #info Proposed Blocker, kernel, NEW 17:15:19 this is another boot-fail kernel bug, we don't have a fix for this one yet, we're not 100% sure about affected hardware beyond 'some AMD chipsets, attached SATA devices may be significant' 17:16:06 i might be inclined to punt this and maybe send out a list mail for folks with AMD chipsets to test... 17:16:19 would be nice to have more info 17:16:46 +1 to more info 17:16:58 +1 punt 17:17:27 574462 17:17:28 more info would be very nice, as would a test with something that actually made it through the merge window 17:18:18 jforbes: sorry, what do you mean? 17:18:53 kparal: test with latest kernel i guess 17:18:58 there's some rc2 builds in koji 17:19:10 ok, will do 17:19:12 for the purpose of this blocker review: even we do not have enough info it seems like it does not affect huge amount of HW 17:19:26 kparal: the kernel from stable is from the middle of the merge window. rc2 is a bit more coherent 17:19:47 tomorrow I'll test latest kernel from koji 17:19:52 jkurik: i'm not sure i'd be confident in saying that 17:20:03 kparal++ 17:20:03 jkurik: Karma for kparal changed to 2 (for the f24 release cycle): https://badges.fedoraproject.org/tags/cookie/any 17:20:28 proposed #agreed 1367321 - punt (delay decision) - we don't yet have a clear enough feel for how much hardware is affected by this to make a decision 17:20:35 proposed #agreed 1367321 - punt (delay decision) - we don't yet have a clear enough feel for how much hardware is affected by this to make a decision, we will send out a request for more people to test 17:20:52 ack 17:20:53 ack 17:21:05 ack 17:21:06 ack 17:21:09 ack 17:21:09 ack 17:21:14 ack 17:21:17 ack 17:21:22 ack 17:21:54 #agreed 1367321 - punt (delay decision) - we don't yet have a clear enough feel for how much hardware is affected by this to make a decision, we will send out a request for more people to test 17:22:08 #topic (1315541) fsck.ext4 discard sometimes fails when run in Koji (results in live image compose failure) 17:22:08 #link https://bugzilla.redhat.com/show_bug.cgi?id=1315541 17:22:08 #info Proposed Blocker, lorax, ASSIGNED 17:22:24 so this is the infamous bug that's existed for half a year or so now, which sometimes prevents live images from composing 17:22:51 well, in its 'natural' state it results in images that don't boot properly; we intentionally tweaked the compose tool so when the bug happens the compose fails 17:23:01 adamw: I think it woyuld be usefull to get bcl to add a ls -lah /dev/shm in addition to the lsof on /dev/shm 17:23:13 can't hurty 17:23:19 we can not guarantee that we can deliver any live media 17:23:36 if a release blocker fails we have to recompose all of the release 17:23:46 i'm still generally of the opinion that there's no great purpose in making this a blocker. 17:23:57 if a non blocking artifact fails we skip it and upset peopl 17:24:22 I am of the opinion that releng can not deliver a compose without it fixed 17:24:38 just trying again is not okay 17:24:48 but if people vote for it i'm not gonna argue. but it does mean we are then saying we won't release anything until someone can fix a bug no-one has any idea how to fix, that doesn't *really* prevent us from releasing stuff, just makes it more inconvenient. 17:25:18 dgilmore: i'd say you could simply say 'as releng grand poobah i'm refusing to compose anything till this is fixed' without using the blocker process. 17:25:41 adamw, How did we get around it for F24? 17:25:48 adamw: perhaps. but then people will say I am an ass and getting in the road 17:25:50 coremodule: just recomposed AFAICT 17:25:58 recomposed... 17:25:58 coremodule: so far we've simply been refiring composes if any blocking live image (that's workstation and KDE) fails. 17:26:06 and thats why 2 of the labs were not in final 17:26:08 dgilmore: not if you're collaborating with the anaconda folks (et al.) to get it fixed. 17:26:09 coremodule: we were lucky that the release blocking images worked 17:26:16 adamw, And sometimes it'll compose that way? 17:26:22 some non release blocking images failed and that made people upset 17:26:26 dgilmore: will not be sufficient to teach pungi about this as proposed by adam in https://bugzilla.redhat.com/show_bug.cgi?id=1315541#c65 ? 17:26:31 coremodule: yeah. the bug is quasi-random. sometimes it happens, sometimes it doesn't, there's no very clear pattern. 17:26:44 it also makes it impossible to enforce spins policy when things are failing all over the place randomly 17:27:01 yeah, thats also anoying. ;( 17:27:09 jkurik: thats not viable 17:27:22 dgilmore: how so? 17:27:22 because it could just keep failing 17:27:37 adamw, Huh, gotcha. 17:27:38 at what point do we say its done 17:27:51 dgilmore: i did cover that in the comment. 17:27:54 true, but given the non-deterministic failures, chances are the incidence of that would go a lot lower. 17:28:01 the failures in f25 have been more consistent than in f24 17:28:21 it's a hack job, for sure, but the thing that concerns me here is we genuinely have no great idea how to fix this or how to figure out how to fix it. 17:28:33 we've all known about it and wanted it fixed for months and no-one's figured it out yet. 17:28:45 adamw: honestly I am not sure we have put in a stong effort to debug 17:28:47 if we make it a blocker are people really OK with us not releasing Alpha for two months or whatever, if we can't fix it? 17:29:00 dgilmore: well, i used to keep bugging people about it and they kept telling me they had 17:29:11 adamw: I know I did not 17:29:13 I gathered as much info as I could for bcl. 17:29:15 adamw: the answer to that is definitely no 17:29:21 not sure I was ever really asked 17:29:24 adamw: Do we have any confidence that if we slipped for a week for more information we might at least come up with a realistic ETA? 17:30:12 I suspect that it may be related to the other /dev/shm bug and seeing what if anything is in /dev/shm may give us a clue 17:30:17 sgallagh: i don't, but you'd probably be better off asking dgilmore/nirik/bcl . 17:30:18 stickster: Well, not necessarily. If it's as serious as it sounds, maybe we scrub the current schedule and immediately shift to the next release cycle. 17:30:23 bcl is apparently still not online today to ask 17:30:35 I wouldn't rule that out as an option 17:30:36 dgilmore: i don't think so, because the times don't add up (dracut wasn't changed back to fiddling with /dev/shm till a month after this bug showed up) 17:31:05 sgallagh: that sounds... really ugly 17:31:43 I am okay punting this to see if we can get more info 17:31:47 for that matter, livemedia-creator could see this case and just recompose... but thats also a massive hack 17:32:21 if this bug is seen as being at that level of stopping the release, it's incumbent on the affected folks to collaborate to fix it... don't wait to be asked ;-) 17:32:32 I really do not want us to be in aplace where we have to run the compose process 4 or 5 or more times to get something where everything required is there 17:33:00 that's a perfectly legitimate stance IMHO 17:33:00 nirik: lots of possible hacks 17:33:13 Punt for more info, I will spend time this week looking at it, as another set of eyes. 17:33:29 could we get it to sleep/pause? then we could go dive in and examine the failed run better... 17:33:32 coremodule: just to thoroughly depress you going in, the problem is that it is quite resistant to debugging 17:33:41 adamw: at the least seeing what is in /dev/shm may give a clue 17:33:42 the bug never seems to happen outside of the infra koji deployment 17:33:56 currently its only looking at what has /dev/shm open 17:34:00 if you just run a bunch of live composes locally you don't hit it, at least so far no-one has 17:34:13 ha, fantastic 17:34:19 * nirik nods 17:34:32 so maybe someone gets to set up an entire pet koji deployment...then finds it doesn't happen there either 17:34:37 * adamw loves being the pessimist 17:34:53 so...anyhow...that's the situation. votes? 17:34:59 we could move the livemedia tasks off of the buildhw boxes and onto vms 17:35:02 we could try in stg... it should be able to do livemedia now I think... but we haven't ever tried 17:35:16 nirik: it does livemedia 17:35:26 its where I tested it when developing support in koji 17:35:31 it was just really slow 17:35:55 maybe we can get coremodule the ability to help there? 17:36:23 sure 17:36:54 stickster, Yeah, glad to do it. I'll need some backup-support, but yeah, I'll do my best. 17:37:06 I would propose to keep it as "proposed blocker" for now and make the decision next week when we hopefully have more info 17:37:09 coremodule: you can look to dgilmore and nirik for that, I would think 17:37:17 so bcl is on pto all week 17:37:37 I am willing to make a build of lorax for testing with a patch to try inspect /dev/shm more 17:37:55 dgilmore: we could possibly do all that in stg to avoid messing things up 17:37:56 jkurik: I am okay with that 17:38:09 nirik: assuming we can reproduce in stage 17:38:19 yeah, true 17:38:21 nirik: maybe moving to buildvms will help 17:38:28 I am willing to try 17:38:30 or hurt. who knows! :) 17:38:35 or hurt 17:38:36 :) 17:38:51 * dgilmore needs to run in 2 minutes 17:38:59 the good news is, it's Fedora, break all you want, we'll make more 17:39:24 :) 17:39:35 we still need votes... 17:39:40 * nirik is fine with punt for a week. 17:39:57 * jkurik is fine with punt for a week as well 17:40:12 i guess i'm still -1 blocker on this because i know if we didn't manage to fix it for a couple of weeks we'd wind up just pressuring releng to do a compose anyway 17:40:15 but i'm fine with punting 17:40:40 I am +1 blocker but okay punting 17:40:59 Punt here, I'll get with dgilmore to see about what I can do. 17:41:05 dgilmore, Are you okay with that? 17:41:29 looks like the "punting" wins 17:41:37 I hate to say it, but I'm +1 blocker on this as well 17:42:58 so far i'd say there's not enough votes to make a blocker call, if no-one else is gonna vote we'll go with punt 17:43:34 hey, perhaps it will be fixed in a week. ;) 17:43:46 * dgilmore needs to run, releng is no go for the record 17:44:04 dgilmore: thanks for joining 17:44:26 proposed #agreed 1315541 - punt (delay decision) - we have +2 / -1 blocker votes so far, this is a pretty squishy issue for blocker purposes, we have some ideas for addressing it so we'll delay the decision for a while and see how things change 17:44:40 ack 17:44:42 ack 17:44:43 ack 17:44:48 ack 17:44:51 ack 17:45:19 ack 17:45:27 #agreed 1315541 - punt (delay decision) - we have +2 / -1 blocker votes so far, this is a pretty squishy issue for blocker purposes, we have some ideas for addressing it so we'll delay the decision for a while and see how things change 17:45:40 #topic (1366403) [abrt] sssd-common: ipa_dyndns_update_send(): sssd_be killed by SIGSEGV 17:45:40 #link https://bugzilla.redhat.com/show_bug.cgi?id=1366403 17:45:41 #info Proposed Blocker, sssd, ON_QA 17:45:51 so, i'll explain this a little 17:46:16 last week we approved this as a blocker on the belief that it was causing login as a freeipa domain user to fail when the system was enrolled via kickstart 17:46:59 during the week we got a fix for the crash and i verified that it does indeed fix the crash, but the login fail still happens. i then filed the login fail separately - https://bugzilla.redhat.com/show_bug.cgi?id=1367604 - and transferred the blocker status there, since that was the issue we'd really approved as a blocker 17:47:17 adamw: It still may well have been doing so; that crash can still have that effect. Unfortuntately fixing it at least revealed a different issue beyond it. 17:47:22 so that one is listed as an accepted blocker now. that leaves the status of this bug in question, so i removed the acceptedblocker status so we could discuss it again 17:47:26 sgallagh: yeah, that's a possibility 17:47:47 so far i have not run a test with the other bug fixed but this one *not* fixed; i will try and do so soon 17:48:18 i figure a sensible decision here would be to punt on blocker status (so i can test it with the other bug fixed and see what happens), accept it as FE (it seems at least FE-worthy) 17:48:28 Well, assuming we leave it as at least FE, it should be academic 17:48:32 yeah. 17:48:39 +1 17:48:55 The patch is *really* trivial (just a NULL-check), so I'm not worried about fallout from it 17:49:03 yeah, it looks fine to me too. 17:49:13 +1 FE from me 17:49:31 sgallagh: "one line fix can not break anything" :) 17:49:32 +1 FE here as well 17:49:41 +1 FE here 17:49:44 +1 fe 17:49:48 +1 FE 17:50:10 jkurik: i believe we've had a case where a one word fix broke everything :P 17:50:15 jkurik: Certainly it can, but one line that is just "If foo != null {" is unlikely to 17:50:39 sgallagh: just so long as it's not 'if foo = null {' 17:50:44 ha 17:50:49 adamw, sgallagh: I was just joking, sorry for it 17:50:51 /me shudders 17:51:12 +1 FE 17:51:43 proposed #agreed 1366403 - punt (delay decision) on blocker, AcceptedFreezeException - we are no longer sure if this bug has true blocker consequences (it's not clear whether it actually prevents domain login). if it's still outstanding, adam will re-test and confirm and we will vote next time. An sssd crasher is certainly serious enough to grant a freeze exception, however 17:52:03 ack 17:52:06 ack 17:52:08 oh man, i'm referring to myself in the third person 17:52:12 ack 17:52:12 this is bad 17:52:26 ack 17:52:27 #agreed 1366403 - punt (delay decision) on blocker, AcceptedFreezeException - we are no longer sure if this bug has true blocker consequences (it's not clear whether it actually prevents domain login). if it's still outstanding, adam will re-test and confirm and we will vote next time. An sssd crasher is certainly serious enough to grant a freeze exception, however 17:52:27 ack 17:52:36 ack 17:52:44 adamw: Could be worse: you could be referring to yourself in the third person using the wrong name 17:52:48 heh 17:52:52 adamw: you're in good company: "adamw smash!" 17:53:23 i don't know if it's worth going through the accepted blockers, their statuses are pretty straightforward; we pretty much have fixes for all of them but the cloud compose, where we're still trying to figure out what the hell's going on (unless dgilmore figured it out while i was playing tennis) 17:53:45 we need to get karma on the fixes and push them stable and blahblah but i can handle that 17:54:12 I am fine to close the review now 17:54:35 * nirik notes infra meeting in about 5min... but we can go to meeting-1 if needed 17:54:57 ok, move along, jkurik 17:55:00 * adamw cedes the floor 17:55:06 adamw: thanks for running that 17:55:18 thanks adamw 17:55:20 including all the blahblah :-) 17:55:37 Thanks adamw 17:55:43 nirik: may I ask you please to go to the meeting-1 ? 17:55:48 sure. 17:55:52 smooge agrees with adamw that speaking to one in the third person is bad 17:56:00 nirik: or, maybe we finish soon... 17:56:11 NOTICE: anyone looking for the infrastructure meeting, please /join #fedora-meeting-1 17:56:19 so, Test matrices to be skipped 17:56:24 for today 17:56:27 #topic Go/No-Go decision 17:56:35 yeah, since we have no RC we can't really talk about the test coverage. =) 17:56:41 releng is already no-go (dgilmore) 17:56:42 QA is no-go, obviously, we have no RC and we have outstanding blockers 17:56:44 very much nogo 17:57:03 no-go with my FESCo hat on 17:57:22 ok, so .... 17:57:30 this is a pretty simple decision :-) :-\ 17:58:19 #info The decision is No-Go due to missing RC and present blockers 17:58:43 #action jkurik to publish the Go/No-Go result 17:58:50 #topic Open floor 17:58:57 nice to know more about Fedora :| 17:58:58 anything else to discuss today on this meeting ? 17:59:54 ok, so thanks for comming 18:00:06 see most of you on the readiness meeting in one hour 18:00:21 #endmeeting