18:00:31 #startmeeting 18:00:31 Meeting started Fri Mar 16 18:00:31 2012 UTC. The chair is davej. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:31 Useful Commands: #action #agreed #halp #info #idea #link #topic. 18:00:44 #meetingname Fedora-kernel 18:00:44 The meeting name has been set to 'fedora-kernel' 18:01:07 #chair davej 18:01:07 Current chairs: davej 18:01:15 whee. 18:01:55 jwb, jforbes: ready ? 18:02:01 davej: indeed 18:02:44 BORN READY 18:02:54 ok, not really. but sure 18:03:19 heh. alrighty. 18:03:40 so, lets start by recapping on last meetings discussion about the common bugs we've been seeing 18:04:14 I think since then, we've been able to attribute a lot more of the "weird shit happened" bugs to the i915 hibernate corruption problem 18:05:13 there's still no progress towards a fix, but it seems that at least keithp is thinking about some theories now. 18:05:24 so on that 18:05:38 we disabled the threaded compression code because we had no idea what was broken 18:05:54 do we want to leave that off, or turn it back on given that it computes a CRC32 over the image? 18:05:57 Ahh, think we should back that out? 18:06:09 yeah, I think we've ruled that as a potential problem now 18:06:17 Okay, I will pull it 18:06:21 i don't think disabling it has changed much, and Bojan is paying attention in the main bug now 18:06:25 seems safe to drop 18:06:26 yeah 18:06:31 it'll make Bojan happy :) 18:07:00 #action revert the 'disable threaded compression code' patch 18:08:33 anything else on hibernate? 18:08:41 so I think we're blocking on keith/intel on this one, so we're just going to have to keep poking them periodically. 18:08:56 it seems there was a brief discussion of IOMMU, but it sounded like that wasn't really at play either 18:09:22 yeah, I think keith's current theory on the GTT needing to be torn down is probably the way forward. 18:09:33 I looked at it myself, but was a bit out of my depth 18:10:16 ajax: you just missed the bit where we talked about how much i915 sucks. 18:11:09 i wouldn't say i've been missing it, bob. 18:11:29 heh. 18:11:38 * EvilBob looks around 18:11:46 i hear you made some progress on that though? 18:12:03 ajax: from what we can tell, the memory corruption is caused by stale GTT entries 18:12:18 so maybe that needs to be torn down before the hibernate happens 18:12:30 but keithp mentioned that there might be dragons there 18:12:52 i'm sure there are, but it surely needs doing. existing bz i should be assigned to, or should i make my own? 18:13:06 the existing bz's are a mess tbh 18:13:13 kernel_hibernate has become a pile-on 18:13:29 GTT and GART are basically the same thing, right? 18:13:35 we're using that bug as a master tracker, rather than anything useful in the comments 18:14:29 jwb: yeah, just a fancy mmu of sorts 18:14:32 k. added to the queue. 18:15:12 don't think there's much else that needs discussing on this ? move on ? 18:15:24 yeah 18:15:33 #topic irqpoll 18:15:36 Josh's favorite patch 18:15:41 ok 18:16:00 so we added a patch submitted upstream that made the kernel fall back to polling IRQs if it found a "stuck" one 18:16:10 it did that, but it was really really verbose 18:16:26 it also was much less tolerant of what it considered a "stuck" irq 18:16:47 so instead of looking to see if it was unhandled 999,000 times, it decided to poll after 9 18:16:51 (yes, 9) 18:16:59 heh 18:17:20 that seems to cause some machines that are either a bit slow or busy to falsely trigger the fallback code 18:17:48 however, really the original patch was only supposed to be a workaround for a specific broken PCI bridge 18:17:58 ASM1083/ASM1085 18:18:19 we've reworked it now to use a PCI quirk to only do this kind of behavior if that bridge is detected 18:18:45 it took a couple of iterations, but it seems to be working well (in other words be completely benign) on non-ASM boxes 18:19:13 for the machines that _do_ have ASM108x bridges, it does fall back to the polling behavior, but it makes the box somewhat laggy 18:19:14 Have we gotten any feedback from an ASM user yet? 18:19:19 yeah, one 18:19:36 laggy and running is better than dead 18:19:55 it's kind of expected that things are going to get laggy when you're polling, particularly if your graphics card happens to share an interrupt with the one that toggles the behavior 18:20:20 i might be able to lessen the poll frequency a bit more and make it not quite so bad, but i need a willing user to test it out 18:20:27 i'm sure i'll find one in not too long 18:20:29 so something that needs doing once the dust settles on this, is to go through the remaining irqpoll bugs that aren't asm108 and see if there's any commonality there. 18:20:35 yes 18:21:09 i think i already collapsed all the asm108 reports into a single bug 18:21:20 so anything other than the one is a candidate for review 18:21:25 I'm wondering why we saw such an uptick in this warning over the last release or so. It may even be a kernel bug for all we know right now 18:21:58 yeah, i'm thinking it might be. again, finding someone impacted that doesn't say it's a one-off and is willing to test/bisect is the key i think 18:22:22 maybe we'll get lucky when we move to 3.3 ;) 18:22:32 could be 18:22:42 speaking of, move onto that topic ? 18:22:50 as for the patch itself, i think it needs more eyes and thoughts before it gets upstream 18:23:00 it's fairly hacky at the moment 18:23:08 anyway, yeah let's move on 18:23:11 ok 18:23:21 #topic upcoming f15/f16 3.3 rebase 18:23:35 I think davej brought up a good question there, if there is another bug that has made the irqpoll problem so much more prominent, it might be that the patch is of much more limited use 18:24:08 jforbes, yeah. the only reason we're sticking to the patch at the moment is that upstream already did quite a bit of analysis on that piece of hardware 18:24:22 it'll still be of use for asm108 I suspect. 18:24:49 anyway, let's see how it works out. 18:25:18 so, 3.3 will probably be final sometime next week. 18:25:53 hopefully in time for the beta. 18:26:18 jwb mentioned earlier that it might be worth jumping on it as soon as it's released for f15/f16 instead of waiting for .1 18:26:53 it's a thought i had anyway. we've been carrying the wireless stack from 3.3 for a while in f16 now already 18:26:59 I don't see a problem with that. We follow upstream closely, and the stable queue. We can easily grab patches before .1 comes out if needed 18:27:33 yeah. and i'm still leary of .1 releases in general anyway 18:27:42 my only concern here is the (small) window where a security bug might come in, and 3.3 regresses booting for someone, so they have to go back to 3.2 without the fix. 18:28:06 is that going to be much different than with 3.3.1? 18:28:09 but we face that problem every time anyway, even if we wait 18:28:14 right 18:28:29 i'd be willing to build some f16 3.3 kernels and put them on my people page 18:28:41 blog about them, get some informal feedback 18:29:02 There's also the question of security bug severity. We cover all CVEs to make sure we are not exposed, but in reality, a majority of CVEs are corner cases most people will never be exposed to 18:29:19 yeah, it's rare that we see something really severe 18:30:19 so I think we're all in the same mindset here that moving forward is the best option. 18:30:37 sounds good to me 18:30:48 #action rebase f15/f16 to 3.3 when released. 18:31:13 i figured we wait a week or so after 3.3 hits f16 stable before we rebase f15? 18:31:26 or maybe that's not worthwhile anymore. seems we don't have a ton of f15 users 18:31:47 I've noticed f15 updates tend to sit waiting for karma a lot longer 18:31:54 I think so, I think your people page idea might be good for a quick 3.2 + patch if we find a really severe issue 18:32:06 true 18:32:35 Yeah, there seem to be very few F15 users, qemu updates were the same way. The people who all jumped on F15 have moved to F16, and a lot skipped it 18:34:26 anything else on rebasing? 18:34:37 not from me. 18:34:50 oh 18:34:54 one idea I had. 18:35:15 once we get 3.3 landed in 16, shall we do a mass "please retest" on the open 16 bugs ? 18:35:29 we've kinda done them by hand up until now 18:35:32 yeah, probably 18:35:49 going through old bugs asking that by hand at the start of a month gets old 18:35:59 there will obviously be some that we know aren't going to be fixed by the rebase, but they should be the minority 18:36:05 That's a good idea 18:36:33 and then maybe 2-3 weeks later, if they're still needinfo, close them insufficient data (using judgment, rather than automated) 18:36:56 that works (there's an automated way?) 18:37:09 yeah there's a "change multiple bugs" link at bottom of a bug list 18:37:12 yeah. i've been waiting 2-3 months before i close out a bug like that, but it seems much too long 18:37:54 so looking at things right now, open bugs: f15:349 f16:509 f17:31 rawhide:144 18:37:58 2-3 months is way too long. Most people who will bother responding will do so in the first week, 1 month is more than enough for automated. They can reopen if they ever get to it 18:38:14 grr... i had f16 under 500 last week 18:38:30 jwb: i was under 500 yesterday, just a timing thing 18:38:33 yeah, me too. then a couple hours later, it went back over. 18:38:52 we've closed 53 f16 bugs this last week alone. 18:39:31 ok, so i think we settled on ask about the rebase, and close if no response in 2-3 weeks 18:40:25 #action after 3.3 rebase, mass update open bugs asking to retest, and if close if no response after 2-3 weeks if appropriate. 18:40:44 the 'if appropriate' part there is obviously where we know something hasn't been fixed 18:42:05 ok, think that's it for the rebase. 18:42:17 and that's all I have on the agenda I think. 18:42:29 #topic open floor 18:42:41 what about DEBUG_VM? 18:43:03 I think leaving that on, at least for a while might be beneficial. 18:43:14 Will f18 go to 3.4 right after the merge window closes? 18:43:14 that it breaks fglrx seems to be the only real fallout so far 18:43:36 brunowolff, probably during the merge window in fedora git, but it might not get built until -rc1 18:43:40 i think that'll be up to jforbes 18:44:30 Yes, depending on the quality of various points in the merge window it will move before rc1, but it will certainly move at rc1 18:44:50 Thanks. 18:45:04 linux-next has at least lowered the number of compile failures we get pre -rc1 18:45:07 I am not going to spend too much time debugging build issues before rc1, but if it builds and boots locally I will push it 18:48:38 any other questions from anyone? 18:49:00 can we shut up alsa? ;) 18:49:13 oh, you asked me about that last time, right? 18:49:17 yeah 18:49:31 ok, let me look at the logs for the last meeting and i'll email upstream about it 18:49:41 ok thanks 18:49:54 #action jwb to look at why alsa is so chatty 18:49:57 on the subject of alsa, we have a lot of sound related bugs that basically get no attention from us at all. 18:50:12 davej, i'm not a meeting chair. the above didn't work 18:50:17 I think we need to be more aggressive about pointing the alsa people at them 18:50:23 #action jwb to look at why alsa is so chatty 18:50:23 yes, agreed 18:50:43 I've sort of been doing that more for the networking related bugs this last month 18:50:57 the netdev guys have been pretty responsive, and easy to deal with 18:51:11 when we do point them at things, a lot of the time we get "load with model=" and it works 18:51:18 which leads to why it can't just figure that out 18:51:25 and if we should be creating udev quirks 18:51:28 yeah, that is annoying. 18:51:36 anyway, more stuff to ask them 18:52:20 perhaps going through and tagging all the sound bugs so we can present them as a list might be useful. 18:52:36 we talked about doing something like this generally before, but never really did it. 18:52:43 (using whiteboard) 18:53:09 does mucking with a whiteboard that has an abrt hash in it break abrt? 18:53:21 ugh, I hope not 18:53:56 if it does, we could use "alsa:" as the start of the subject 18:54:07 or use the keywords field 18:54:10 s/subject/title 18:54:13 keywords are pre-defined 18:54:17 ah, crap 18:54:22 i don't think it lets you put arbitrary ones in there 18:54:29 yeah looks like you're right 18:54:41 there's 'Devel Whiteboard' 18:54:44 i have no idea what that is 18:54:46 would be nice if bugzilla had a subcomponent field 18:54:57 yes. it would 18:57:00 We could use subjects to make bugsearch more effective 18:57:22 like the "alsa:" proposal in the title? 18:57:28 alsa: netdev: mm: etc 18:57:33 jwb: yeah 18:57:43 it'd match how upstream does patches too. ;) 18:58:05 yeah. also, I've been trimming some of them so that they line up better in the lists. (ie, removing 'kernel:' '[abrt]' etc so they all have a uniform pattern 18:58:19 it's made it easier to see dupes in some cases 18:58:26 i'm good with using a subsystem subject 18:58:45 ok, let's give that a try. 18:58:58 maybe some day we'll even have community triagers that can triage bugs and put the appropriate subject there 18:59:18 we live in hope 18:59:57 ok, let's call this done. 19:00:04 if we came up with a list of subject prepends we want, I can throw up a quick "kernel bug triage page" and throw it to the test list 19:00:26 jforbes: there's one already (linked off the main kernel page) 19:00:30 so maybe update that 19:00:34 Can do that 19:00:35 #endmeeting