18:03:28 <davej> #startmeeting
18:03:28 <zodbot> Meeting started Fri May  3 18:03:28 2013 UTC.  The chair is davej. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:03:28 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
18:03:28 <davej> #meetingname Fedora Kernel meeting
18:03:28 <davej> #meetingtopic Fedora Kernel meeting
18:03:28 <davej> #addchair jforbes jwb
18:03:28 <zodbot> The meeting name has been set to 'fedora_kernel_meeting'
18:03:39 <davej> woo woo.
18:03:48 <davej> is there anybody out there ?
18:03:51 <jwb> no.
18:04:02 <jforbes> yup
18:04:16 * brunowolff is here
18:04:35 <davej> alrighty. let's start as usual with the state of the trees
18:04:53 <davej> let's do f17/f18 as one, because they're pretty much the same thing still
18:05:01 <jwb> start with them or rawhide?
18:05:09 <davej> the former
18:05:14 <jwb> ok
18:05:21 * nirik is lurking around.
18:05:30 <jforbes> Not much to speak of in F17/F18 at the moment, we need to move to 3.9 soonish
18:05:34 <davej> currently on 3.8.11.  3.9 got released last weekend. we'll start on a rebase next week.
18:05:43 <davej> probably coinciding with 3.9.1
18:05:57 <jforbes> I suppose it might be worth noting that by some miracle F17 got enough karma to push a kernel without manual intervention
18:06:14 <jforbes> So thanks to those of you testing and giving karma there
18:07:12 <davej> lots of old 17/18 bugs starting to get closed out through inactivity.  I'm not thrilled at closing some of them without resolution, but if the reporters have gone away, we don't have a lot of choice
18:08:11 <davej> anything else on 17/18 ?
18:08:18 <jforbes> nothing else here
18:08:31 <davej> ok. next up.. 19
18:08:49 * nirik does have a f17 vm for testing, but I'm usually too busy to remember
18:08:54 <jwb> so f19 is on 3.9.0 at the moment.  it will continue on the 3.9.y series until F19 GAs
18:08:56 <nirik> if you need karma feel free to ping me
18:09:06 <jwb> fairly stable at this point, so not a ton of worries
18:09:29 <jwb> Beta should include some 3.9.y stable kernel
18:09:42 <jwb> and debugging will remain off for good now
18:10:06 <jwb> i think that's about it on f19
18:10:35 <jwb> any questions?
18:10:39 <davej> 19 could use a bz sweep to clear out some of the older bugs by the looks of things. 129 open right now
18:10:52 <davej> I'll have a look through some of those this afternoon
18:10:56 <jwb> yeah, i haven't looked at bugzilla yet.  on tap for next week
18:11:20 <jwb> though a lot of those 129 were actually bugs moved to f19 from rawhide.  that... wasn't helpful
18:11:59 <jwb> onto rawhide?
18:12:04 <davej> probably should tag more of the rawhide bugs with the whiteboard tag I can never remember that prevents that
18:12:11 <jwb> FutureFeatre
18:12:22 <jwb> which is a lie, but makes the scripts stop messing with things
18:12:28 <davej> that's probably why I can't rememebr it, it's badly named
18:12:32 <jwb> yeah
18:12:36 <davej> maybe we can request a better one ?
18:12:40 <jwb> and it's a keyword i think
18:13:05 <jwb> probably.  we can bug jreznik about it iirc
18:13:21 <davej> is that who runs the scripts to do the migration ?
18:13:39 <jwb> he did this past time
18:13:48 <jwb> in the past it was bugzappers, which no longer exists afaik
18:13:52 <davej> brb, doorbell.
18:14:19 <jwb> any other comments/questions on f19?
18:15:04 <jwb> ok, rawhide
18:15:41 <jwb> rawhide is in the middle of the 3.10 merge window.  i've been building quite a few kernels per day for this, mostly so that if things break we have granular snapshots of the merge window as it was progressing
18:15:52 <jwb> that will hopefully make it easier to narrow down where something broke
18:16:21 <jwb> the latest build is the one right before the DRM tree was merged.  i have that done locally, but it breaks at least one of my machines.  bisecting at the moment
18:16:25 <jforbes> rawhide nodebug tends to build about 1 kernel per day, I was starting builds with every rawhide build and they werent finishing before the next was ready
18:16:45 <jwb> yeah.  the dedicated build machines are speedy now
18:18:07 <jwb> until the DRM merge, things have been looking OK.  i know davej has seen some boot issues in the clocksource code, but i haven't on any of teh machines i have
18:20:00 * jwb idles until davej gets back
18:20:58 <jwb> oh, in case someone was wondering, the secure boot patchsets are not in the upstream 3.10 kernel.
18:21:01 <davej> sorry about that, had to resign my lease paperwork..
18:21:15 <jwb> your landlord has great timing
18:21:23 <davej> indeed :)
18:21:36 * nirik has had no issues with the latest rawhide kernels here so far.
18:22:00 <jwb> any other comments/questions on rawhide?
18:22:02 <davej> I'm pretty amazed I'm the only one seeing those clock bugs (apart from Yinghai)
18:22:12 <davej> every machine I try hits it (or a variant of it)
18:22:30 * nirik has a few items for open floor or whatever.
18:22:50 <jwb> davej, are they old?  weird bios?  all one kind of CPU?
18:23:08 <davej> couple years old. intel and amd.
18:23:18 <davej> the amd is maybe 3 years old
18:23:20 <jwb> strange
18:23:25 <davej> the oldest intel is from 2007
18:23:47 <davej> I just ooze gamma rays or something
18:24:03 <jwb> is your .config different from fedoras?
18:24:14 <davej> it's a cut down version
18:24:26 <davej> so fedora (debug) without the drivers
18:24:42 <jwb> shouldn't be vastly different then, unless you and i answered a question differently
18:24:50 <davej> actually maybe I tweaked a few other things too. I should double check
18:25:20 <davej> DEBUG_PAGEALLOC maybe
18:25:46 <davej> anyway.. that's pretty it for release overview I guess ?
18:26:06 <jwb> think so
18:26:31 <davej> ok. let's talk a little about the writeup you did at http://fedoraproject.org/wiki/KernelBugTriage
18:27:00 <davej> for those who haven't seen this yet, Josh put some work into codifying the sort of triage activities that would be useful to us.
18:27:21 <jwb> right.  and spot has the aliases there to CC now
18:27:51 <jwb> basically, all of the bugs need to be triaged.  it will take quite a bit of effort
18:27:53 <davej> yeah, that's probably the biggest change. as you can see there are a whole bunch of new aliases, and we'll add more as necessary
18:28:34 <nirik> if you need any of them changed, anyone in sysadmin-* can change them... or I can get some or all of you added to do that too if you prefer.
18:28:49 <jwb> i think i'm in sysadmin-*
18:28:58 * nirik nods. probibly are.
18:29:12 <nirik> do you get tons of nagios emails you filter to /dev/null ?
18:29:12 <jwb> yes, i am
18:29:14 <nirik> :)
18:29:17 <jwb> i do!
18:30:12 <davej> if anyone has questions about anything on that triage doc, follow up on the fedora-kernel-list, and we'll try to expand on anything that's unclear
18:30:40 <davej> anything more to say about that for now ?
18:30:46 <jwb> yes.  because i'm sure there are things that are unclear.
18:31:50 <jwb> i think that's it on triage then
18:31:55 <davej> ok. let's talk a little about the automated testing.
18:32:11 <jforbes> Okay
18:32:31 <jforbes> We have hardware, hopefully being racked soon.
18:32:44 <nirik> that was one of the things I wanted to mention. ;)
18:32:46 <jforbes> The plan is to get things setup the week of May 13th
18:32:54 <davej> nirik: I heard there was some confusion about the drac cards or something ?
18:32:56 <nirik> we have the hw, are working on getting console access so we can install them. ;)
18:33:04 <jforbes> Oh?
18:33:12 <nirik> yeah, it seems they didn't ship with mgmt...
18:33:18 <nirik> or it was unclear if they did.
18:33:45 <davej> ok, so who is chasing that up, you or spot ?
18:33:46 <nirik> we will get it sorted out. We have a kvm thing that is supposed to let us have console on things with no mgmt, but it's java is busted.
18:34:02 <nirik> Smooge was working with the datacenter folks.
18:34:09 <jforbes> That's unpleasant.  Console access is kind of important in this setup
18:34:12 * nirik will ask him for a update.
18:34:43 <jforbes> nirik: let us know if they wont be ready by week after next so we can plan accordingly
18:35:05 <nirik> ok. can do.
18:35:18 <nirik> feel free to ping me anytime if you want me to scare up status too.
18:35:38 <nirik> you folks will need access to the mgmt if it exists?
18:35:39 <davej> jforbes: they aren't essential to the setup though right ?
18:35:48 <davej> they're only really needed once we're in productionm
18:36:17 <jforbes> davej: well, they need some sort of console access to install them
18:36:48 * nirik nods.
18:36:52 <davej> oh, duh
18:37:18 <nirik> once installed tho, will they often need reinstall from mgmt?
18:37:35 <jforbes> nirik: no, but we need a way to get console on crashes
18:37:53 <jwb> because we expect things to crash.  :)
18:37:55 <nirik> ok. we will have serial too.
18:38:25 <jforbes> nirik: yeah, serial is really what we need.  Also, is there a way we can script a remote power toggle to force a reboot?
18:38:28 <nirik> and power of course.
18:38:57 <nirik> possibly...
18:39:18 <nirik> power is 2 apc units you have to ssh into... and serial is a ssh or web login.
18:39:31 <jforbes> nirik: it would be good to have each machine monitor the other and reset it if it dies without intervention.  Then it can send us what was on the console
18:39:39 <jforbes> Oh, ssh could work
18:39:55 <davej> as long as it's ok with passwordless keys
18:40:05 <jforbes> davej: worst case, expect
18:40:13 <davej> gross, but yeah
18:40:15 <nirik> it's not sadly.
18:40:25 <davej> hmm. ok. that's something else for the TODO
18:40:27 <jforbes> like I said, worst case.
18:40:37 <nirik> and also other machines will be on those power, so probibly we could get you guys access, but I wouldn't want a script to contain the passwords.
18:40:48 <nirik> there's also always watchdog. ;)
18:40:56 <jwb> jforbes, wait, expect like 'expect(1)' ?
18:41:35 <jforbes> jwb: expect like tcl
18:41:42 <davej> worst case, the 2nd machine can notice the 1st isn't up and send a mail so we manually intervene
18:41:44 <nirik> anyhow, once we get console we will ping you on how they are really configured and such
18:41:56 <jwb> jforbes, yeah, that's what i meant.  giving me LTP flashbacks
18:42:00 <jforbes> nirik: excellent
18:42:38 <jforbes> jwb: yeah, I last wrote an expect script in 1998, but I think I still have it around, and it was specifically to ssh into something and run commands, so it is reusable if I can find it
18:43:37 <jforbes> So once all of that is in place, we should have automated testing working within the week of May 13th for every kernel build
18:44:09 <nirik> so thats installing and rebooting to new kernel ?
18:44:14 <nirik> and running some tests?
18:44:27 <jforbes> https://fedoraproject.org/wiki/KernelTestingInitiative has the details, though I will be updating that page with current status shortly
18:44:46 <jforbes> nirik: Yes
18:44:54 <nirik> cool. :)
18:44:55 <davej> jforbes: we can probably just share the work-in-progress plan that we've been doing thelast couple days once we've got that filled out a bit more
18:45:17 <davej> that should give nirik the background of what we're needing
18:45:23 <jforbes> nirik: also using several guests, but the hosts installs and tests too
18:45:23 <nirik> we can also monitor the machines with nagios, but if they reboot a lot not we will need to make sure the timeouts are high
18:45:32 <jforbes> davej: correct
18:45:50 <jforbes> nirik: yeah, they will reboot almost daily, sometimes more than once a day.  high timeout is fine
18:46:29 <nirik> yeah, so that might be another option... just have a ping check, if they don't respond for an hour or something high, power cycle or alert you
18:47:58 <jforbes> nirik: sure, it would just be nice to have something catch the console and alert us.  Even better if it can do that, then reboot it.  So we get the relevant info, but it can go back to testing
18:48:12 * nirik nods.
18:48:58 <jforbes> Overall I am rather excited to get this working, I hope it will catch bugs before users ever see them
18:49:51 <nirik> yeah, sounds good.
18:51:27 <davej> jforbes: how different is the cloud testing stuff going to be to this ?
18:51:46 <davej> is that just teh same thing but single-host ?
18:52:28 <jforbes> davej: none at all, the regression test harness box will be dynamically starting an EC2 instance to run the regression suite just like any other virtual machine
18:53:10 <davej> ok, nice
18:54:05 <davej> anything else on this for now ?
18:54:14 <jforbes> The EC2 instance runs the test then shuts down, if there are problems with the boot, we should get a message from the harness, if there are problems with the actual tests, we should get an email from the EC2 instance
18:54:18 <jforbes> nothing else here
18:54:46 <nirik> if you like... we could possibly get you access to fire off a openstack instance too in our private cloud.
18:55:02 <jwb> that's using KVM, right?
18:55:12 <jforbes> nirik: wouldn't hurt
18:55:26 <nirik> yep
18:55:29 <nirik> kvm...
18:55:40 <jwb> it wouldn't no, but we have KVM covered with the existing setup.  i suggested AWS/EC2 because it's xen guests
18:55:45 <jforbes> nirik: we can look at that after the rest is set up, we are testing kvm locally
18:55:57 <nirik> sure. just something to drop on todo list...
18:56:30 <davej> something I'd like to get to eventually would be specialised guest images too
18:56:50 <jwb> davej, meaning?
18:57:02 <jforbes> davej: certain;y possible, we have the storage, just not the memory for running too many concurrent guests
18:57:07 <davej> for eg, we have one bug open right now with a user of a virt machine running hadoop or something.  I know nothing about how that osrt of thing is set up, but if we have prepackaged reproducer cases like that, adding them to the mix would be useful I think
18:57:24 <jwb> ah, yeah.  good idea
18:57:52 <davej> I do my best to ignore "misc java bonghits", but we seem to get quite a few bugs of that sort fo thing
18:58:14 <jforbes> That kind of stuff is a logical extension, easy to work out
18:58:16 <davej> might just be that those things are just pushing the vm a lot more when they suck up lots of ram
18:58:44 <davej> jforbes: yeah, I've deliberatly not added to the plan for now, just to limit scope
18:58:58 <jforbes> Anything else on the testing?
18:59:38 <davej> think we're done
18:59:52 <davej> filled the whole hour, good meeting everyone.
18:59:59 <davej> #endmeeting