15:19:08 #startmeeting qadevel 15:19:08 Meeting started Tue Dec 16 15:19:08 2014 UTC. The chair is tflink. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:19:08 Useful Commands: #action #agreed #halp #info #idea #link #topic. 15:19:13 #meetingname qadevel 15:19:13 The meeting name has been set to 'qadevel' 15:19:17 #topic roll call 15:19:22 * kparal here 15:19:28 * mkrizek here 15:19:28 * lbrabec is here 15:19:30 * garretraziel here 15:19:35 * jskladan tips his hat 15:20:20 * danofsatx-work is hovering 15:20:26 * roshi is here 15:21:20 wow, quite a few people today 15:22:05 I suppose we can get started, then 15:22:15 #topic Status Updates 15:22:50 mkrizek: can you talk a bit about the work you've been doing on disposable clients? 15:23:04 yeah 15:23:39 so me and tim has been testing latent client with openstack 15:23:56 there has been a few issues with fedora openstack instance 15:24:19 we got it working on local instance at least 15:24:39 "latent"? 15:24:42 there are few issues with image maintanence (or however the word is spelled) 15:24:51 latent = on demand 15:24:54 ok 15:25:24 so it seems like we might try non-cloud clients as well to see if it's better approach 15:25:33 #info local instance of buildbot with openstack latent buildslaves was working 15:26:13 #info some issues with image maintenance and some concerns about complexity 15:26:52 for the sake of education, can someone give a brief one-liner definition/description of a disposable client? 15:26:57 tflink: have you had any luck with eucalyptus? 15:27:26 danofsatx-work: a disposable client is a VM that you can spawn, run a task on, and then destroy it all auto-magically 15:27:28 mkrizek: didn't try using it with buildbot, to be honest 15:27:29 aiui 15:27:52 yeah, the key bit is that they're destroyed after every task, so we don't have to worry about what is done in the task 15:27:53 ahhhh....I do that all the time, somewhat more tediously than spawn/run/destroy.... 15:29:03 my concern about using another cloud system (eucalyptus, open nebula etc.) is that we'd have to maintain the system 15:29:44 and even though both euca and opennebula are much easier to set up than openstack is, that's still an extra bit that I'd rather not deal with 15:30:05 and package it for Fedora as well? :) 15:30:15 yeah, that as well 15:30:28 scary 15:31:21 the no-cloud clients have their own complexity, as well but we'll get into that :) 15:31:35 everything has the complexity :) 15:32:00 tflink: mostly that we need to write custom code? 15:32:02 mkrizek did figure out how to build custom cloud images, though and that's something we were going to need either way 15:32:20 mkrizek: that's part of it, yeah 15:32:38 what's a "custom cloud image?" 15:32:39 but I suspect we were going to need to write custom code either way 15:33:04 roshi: a cloud image with a package set and configuration that suits us 15:33:10 in our case, cloud image with buildslave running 15:33:35 in the case of using openstack, it had to have buildslave installed and pre-configured to connect to the master on startup 15:33:50 could always handle that with metadata instead of mucking with a custom image, fwiw 15:34:02 in other contexts, updated images with some base packages we'll use 15:34:15 roshi: you can install and configure packages with metadata? 15:34:31 yeah - I haven't dug really deep into it though 15:34:57 but adding scripts into the user-data of AWS or Openstack images before they start is how people sping up new cattle when they need them 15:34:58 hrm, might be worth looking into, then 15:35:09 I don't think they make a custom image, or that it'd be very common 15:35:27 but you can install packages and run scripts to configure your instances 15:35:46 s/sping/spring/ 15:35:52 * tflink figured they did it with disk images - spin up an instance, configure it how they wanted, create the image with the cloud service (snapshots, etc.) and create cattle from there 15:36:29 that's another way to do it for sure - but the readin I did, cloud-init can do a bunch of stuff to customize your image 15:36:54 but like I said, I haven't dug very deep into it - so no promises on if it'd do everything we need 15:36:58 #info it'd be worth looking into other ways of cloud image customization that don't involve custom images 15:37:35 still may be worth looking into 15:37:47 s/may/is 15:37:51 yup 15:38:04 I think we'll get into more of the details here when we start talking about disposable clients, though 15:39:00 for qadevel, I've updated the phab install and got a new cert that I'll be deploying soon (the current one expires 2015-01-17) 15:39:49 the last upgrade was a bit messier than I'd like due to some database changes that were designed for a newer version of mysql but hopefully there won't be any more quite like that 15:40:28 at some point, I still want to move qadevel into infra so we can have more CI, host git repos etc. but all of that will take time that we could otherwise be spending on tools 15:40:44 makes 15:40:44 so, not sure when it'll actually happen since our current setup is working for now 15:40:48 makse sense, rather 15:41:11 #info phabricator on qadevel was upgraded yesterday 15:41:31 #info still want to move the machine inside fedora infra, not sure when that'll happen 15:42:12 we have daily backups of the machine but it's not so much ansible controlled anymore after the playbooks we were using drifted away from actual configuration 15:42:29 so if something bad happens, we're not in too much trouble - mostly hassle 15:43:32 and one more thing: I'm planning to turn off username/password logins in phabricator this friday 15:43:46 #info turning off username/password logins this Friday 15:44:03 last I checked, there were still several folks who hadn't linked their accounts to FAS with persona 15:44:11 so please do that if you haven't already 15:44:13 does it involve someone from QA team? 15:44:30 kparal: there were 2 folks at this meeting the last I checked 15:44:42 but I haven't re-run the query to check this morning 15:44:47 I've done this today 15:44:50 I think you can name them 15:45:03 the other was lbrabec 15:45:09 nothing better than a bit of public shaming 15:45:22 .fire lbrabec 15:45:22 adamw fires lbrabec 15:45:31 ok, that's settled 15:46:08 * kparal notes he will need to leave shortly after 16 UTC 15:46:12 oh fun, this is sorted differently than it was befor the upgrade 15:46:13 i've done it too 15:46:18 lbrabec has also done it 15:46:25 * tflink just ran the query 15:46:30 great 15:46:36 .unfire lbrabec 15:46:42 actually, it looks like almost everyone has at this point 15:46:48 .hire lbrabec 15:46:50 adamw hires lbrabec 15:46:55 yay 15:47:02 didn't know it works 15:47:05 I see 2 remaining folks, will start pestering them later this week 15:47:15 none of whom are at the meeting 15:47:52 but I suppose we should move on - lots of stuff left to cover and not so much time 15:48:21 #topic Planning - Fedmsg Integration 15:49:04 This wasn't on our list of higher-priority tasks from this summer but it's become a higher priority due to the fact that bodhi2 is going to production soon 15:49:26 their current plan is Jan/Feb 15:50:08 threebean and I have been talking about it a bit on qadevel@ but this is still very much at the early stages 15:50:48 great news 15:51:29 most of the work that has been done is in that thread, though 15:53:05 we still need to figure out what the messages will look like (org.fedoraproject.taskotron.result, org.fedoraproject.taskotron.result.new etc.), what data the messages will contain, what we're going to do about depcheck and upgradepath results etc. 15:53:26 which I suspect will be the most difficult part of this 15:53:51 I think that the actual code to emit fedmsgs will be less complex than most of the other bits we're currently looking at 15:54:45 but the question remains - who's going to work on it and what gets de-prioritized to get it done 15:55:04 any questions or comments on this? 15:55:31 kparal: that macro is heresy and i demand it be removed 15:56:06 I would be interested in working on it, but I don't really want to promise something, I'm leaving for PTO and I still haven't even caught up with qa-devel 15:56:23 adamw: you mean the hiring one, right? 15:56:34 * mkrizek would be interested as well 15:57:11 so I could work on it in January 15:57:14 the other potential issue is that we'll want to work with threebean on the implementation since AFAIK, none of us has much/any experience with fedmsg 15:57:27 kparal: naturally. 15:58:08 ideally, I'd like to get this into staging by mid-january at the latest so there's some time to work out issues before going to production 15:58:46 I'm fine with helping mkrizek and threebean in January, coding or testing it 15:58:52 #info kparal and mkrizek have expressed interest in working on fedmsg integration 15:59:12 mid-january seems like challenge since all those PTOs 15:59:23 hence the "ideally" bit :) 15:59:27 :) 16:00:53 Hello, folks. We've got the Server SIG meeting scheduled in here, but if you're going to be a while, I'll move it. 16:01:04 sgallagh: I thought that was in 30 minutes 16:01:17 1600 UTC 16:01:27 looks like I read the calendar wrong 16:01:39 sometimes the calendar has rendering issues 16:01:49 fedora-meeting-2? 16:01:52 No problem. #fedora-meeting is free 16:01:54 We'll move there 16:02:07 ooh, we're on the main stage? 16:02:11 * adamw tunes nervously 16:02:43 thanks, sorry for the bother 16:03:18 There is a server wg meeting started there :) 16:03:28 oops too many there. 16:03:49 * roshi pinged the cloud people for the cloud-init discussion 16:03:50 kushal: Server meeting is moving to #fedora-meeting 16:04:18 ah, I thought it is the qa meeting here :( 16:04:21 stupid me 16:04:21 we can work out more of the details about who's working on fedmsg stuff after the meeting 16:04:54 since we're already over time due to starting late :-/ 16:04:57 kushal: The qadevel meeting is right here. 16:04:58 kushal: this is the meeting :) 16:05:13 #topic Planning - Disposable Clients 16:05:37 Oh 16:05:37 continuing the conversation from earlier - we still need to figure out a way to get disposable clients for taskotron 16:06:02 * tflink sent a proposal for non-cloud-system disposable clients to the list 16:06:33 non-cloud-system? 16:06:35 for backround, disposable clients are instances taskotron can spin up, run a task on and destroy 16:06:35 there weren't any "that's a horrible idea" comments but there were some concerns 16:06:52 dustymabe: no openstack or comparable systems 16:06:59 tflink: ok 16:07:44 * roshi finds the mail on the list 16:08:00 my concern is about the complexity that we'd be facing by using a cloud system, even if we're not maintaining it 16:08:34 there are some possible hidden network gremins if we use the infra openstack instance but I think those would be workable 16:09:08 the basic idea would be to run the buildslaves on bare-metal and spawn a local VM from the task itself for the actual task work 16:09:20 the advantages I see here are: 16:09:36 1) it avoids the complexity of working with cloud systems 16:10:17 2) it solves the problem of figuring out what kind of image to use for each task. since the task is spawning the VM directly, metadata can live in the recipe 16:10:42 tflink, Or you can have one or two powerful cloud-in-box setup and use those 16:10:54 3) it keeps more with the theme of being able to run stuff locally and keeps production similar to local 16:11:51 kushal: which ones are you talking about? I'm aware of packstack 16:12:24 * tflink poked around with packstack and as a result, refuses to be involved with maintaining an openstack install 16:12:25 tflink, just any, it can even be a manual setup done once. 16:12:46 the problem there is maintenance 16:12:54 https://lists.fedoraproject.org/pipermail/qa-devel/2014-December/001038.html 16:12:57 there 16:13:08 and the fact that any downtime would bring production taskotron to a halt 16:13:19 tflink, I agree with your point. 16:13:20 * kparal will need to leave soon 16:13:22 tflink, Just for test, you can try Eucalytpus cloud in a box setup once. 16:13:51 kushal: yeah, that was much easier to deal with than openstack 16:14:51 tflink, for a cloud in a box setup, there is generally not much to maintain in it, but I do agree with your point of even simpler things can be done. 16:14:52 fwiw, I liked open nebula the most but it would still involve packaging and maintenance. we have a high enough system maintenance burden as it is, I think 16:14:56 regardless of the platform we'd be using (or non-platform) I asked the cloud guys if my understanding of cloud-init was right and they seemed to think that it would do everything you needed 16:15:03 instead of rolling custom images 16:15:12 roshi, Yes. 16:15:20 roshi: yeah, should be 16:15:33 https://cloudinit.readthedocs.org/en/latest/topics/examples.html#install-arbitrary-packages 16:15:35 cloud-init can install packages and insert config files? 16:15:36 for background 16:15:37 custom image is not desireable 16:15:46 tflink, Yes, see that link. 16:15:47 tflink sure 16:16:05 how much does it slow down startup? 16:16:24 tflink, what ever time it takes to install the packages. 16:16:41 tflink: I guess that depends on how much net/cpu/io performance you have 16:16:43 For example, there are places where a jenkins system starts a new cloud instance and run the tests inside it. 16:17:02 outside of the obvious complexity, why is rolling custom images a bad idea? 16:17:15 tflink, In that case I think my friends are using custom images. 16:17:19 we could also have one, cloud-init configured gold image, and use it as a backing store for the spawned images 16:17:29 * roshi is working on this bit for testCloud 16:17:54 roshi: I don't think one image will be enough, but that doesn't mean the method couldn't work 16:17:59 tflink, from my point of view custom images are not a bad idea in case we are running them a lot. 16:18:09 tflink, Correct. 16:18:26 tflink: I guess I have a different point of view 16:18:28 Like we had different images for python projects and different ones ruby projects. 16:18:34 custom images are just another thing to manage 16:18:44 or three - but to cut back on install time, for however many configs you have, have a backing store for each 16:18:53 #info cloud-init can probably do most of what we're looking for in a base disposable client image, may be a better option than custom images 16:18:54 dustymabe, True, but they are mostly make once and forget. 16:18:58 cuts down on used disk space, it easily reproducible 16:19:04 thanks folks for the status updates and the discussion, I'll read the rest of it in meeting minutes. see you 16:19:20 kparal: see you next year 16:19:24 :) 16:19:25 later kparal ! 16:19:25 kushal: for testing I prefer to automate building things from scratch 16:19:31 have a good holiday 16:19:52 thanks, all of you as well 16:19:55 kushal: I think it leads to less confusion on what exactly gets tested 16:19:56 dustymabe, Yes, but to reduce time we preferred to use custom images. 16:20:10 dustymabe, True, depends on case by case. 16:20:26 dustymabe, like few cases should run on latest things. 16:20:44 dustymabe, tflink using https://cloudinit.readthedocs.org/en/latest/topics/examples.html#run-apt-or-yum-upgrade 16:21:05 tig 16:21:12 ^^ oops 16:21:18 What is tig? 16:21:29 kushal: really? it will change your life 16:21:41 it's dustymabe 's pet marmot 16:21:48 if you like a good tui :) 16:21:51 Time to go? 16:22:05 tig is a text user interface for git 16:22:08 the only potential issue with that is that I want to have sub 30 second startup time if at all possible 16:22:13 or https://github.com/jonas/tig 16:22:16 ah yes 16:22:42 tflink, docker, less than 1 second ;) 16:23:00 tflink: I see. It depends on how many rpms you are installing. If 2 or 3 then it won't take that long. if 100 then it definitely will be longer 16:23:05 docker is for sure something they'd be looking at in the future 16:23:09 aiui 16:23:17 * tflink would rather use full VMs, for now, at least 16:23:42 tflink, Then get a good hardware, with ssd etc 16:24:01 You will get less than 30 seconds boot up but the updates will take time. 16:24:05 if we can get regularly updated cloud images, it may be an option 16:24:26 tflink, It is very easy to generate those locally. 16:24:27 cloud images should be updated every 30 days 16:24:44 but no matter which disposable client method we use, images are going to be a concern 16:25:06 tflink, we have to keep updating them if you want speed. 16:25:25 kushal: yeah, that was mostly what I was thinking about 16:26:02 tflink: one other way to speed things up is to put the disks in ram (if you have boxes with a lot of ram) 16:26:11 disks = disk images 16:26:22 but we have a lot more stuff to work through before we need to worry about how the cloud images are being generated/updated 16:26:43 dustymabe: that's one possiblity, we're more CPU and disk limited than RAM 16:26:53 dustymabe, +1 16:27:19 tflink: I did an upgrade from RHEL5 to RHEL6 in 13 minutes one time when I put the disk image completely in ram 16:27:22 :) 16:27:42 and I know what you are thinking RHEL5->RHEL6 ? 16:27:43 we don't have enough ram to put all the disks in memory, though 16:27:57 yes.. we custom hacked that crap at an old company I worked for 16:28:12 pure rpm upgrade all the way 16:28:25 as much as I appreciate the help, I'd like to get back to the core topic before more folks sign off for the day 16:28:38 tflink, please go ahead. 16:28:41 since it'll be weeks before we can have another meeting 16:29:00 tflink: ok. but you should know the disk space issue should be less of a problem by using COW 16:29:32 dustymabe: I'm more worried about disk throughput than capacity 16:29:44 but that's a concern for _after_ stuff is working 16:29:46 :) 16:30:09 anyhow, the core of the idea was to spawn the VMs during task execution 16:30:10 tflink: thats why you put stuff in ram :) 16:30:15 im done :) sorry 16:30:27 which does solve some problems, but creates another big one that hasn't been discussed much yet 16:30:47 dustymabe: yeah, hadn't thought about doing that - might be worth looking into 16:31:04 the biggest problem that I see is actually executing stuff on the spawned VM 16:31:27 with latent buildslaves, buildbot handled the execution of commands 16:31:58 if we went forward with the no-cloud solution, we'd have to figure out how to manage task execution in the VM 16:32:14 my first thought was ansible 16:32:37 the most obvious solution would be to use cloud-init to put a password-less ssh key in teh images either before or at startup 16:33:17 tflink, yeah 16:33:35 roshi: that's an interesting thought. what do you see as the biggest advantages to doing that over more-raw ssh into the VMs? 16:34:11 * tflink figured that we could use ansible to allow folks to configure VMs to their liking but that would be a later feature 16:34:24 my biggest concern about using ansible like that is how quickly it's changing 16:34:25 storing ansible playbooks might be easier to manage and use than other things/ 16:34:28 ? 16:34:46 we wouldn't have much control over when ansible is upgraded and that would open us up to problems 16:34:52 ah 16:35:01 then pure ssh would probably be best 16:35:31 it still dances around the problem of how to execute commands :-/ 16:35:46 what do you mean by "execute commands?" 16:36:03 for me, the most obvious solution would be to scp the recipe into the client vm and run libtaskotron on it, skipping the vm startup 16:36:07 like run 'sudo /bin/true' and return stdout? 16:36:19 roshi: running the task outlined in the recipe 16:36:44 scp and run would work pretty easy 16:37:16 another option would be to write a wrapper to shift execution from individual actions to the VM but the complexity there scares me a bit 16:37:40 tflink, roshi Let me introduce you to fabric, the python module to execute tasks remotely. 16:37:53 It can even handle usernames and passwords for ssh 16:37:54 that would be another option 16:37:56 fabric could work too 16:37:59 * tflink has used fabric before 16:38:05 tflink, Cool :) 16:38:05 * roshi has used fabric before as well 16:38:20 I use them regularly 16:38:41 * roshi has been on the fabric mailing list from 2006 or something like that :) 16:38:45 mkrizek: do you have any thoughts on all this? you've spent the most time futzing with buildbot's latent buildslaves and openstack 16:39:31 mostly whether it sounds like we'd be setting ourselves up for bigger problems by attempting to avoid the problems you were seeing with the latent buildslave approach 16:40:09 I'm not sure if we'd know for certain until we got a proof-of-concept running, though 16:40:33 since I think we're weighing the complexity of latent buildslaves and openstack with a more custom solution 16:40:42 not sure actually, seems like we'd be replacing one complexity with other to me :) 16:40:53 which may end up at the old trade off of control vs. already-tested bit 16:40:55 bits 16:41:34 trying something non-cloud as proof-of-concept would be certainly helpful 16:41:38 the other advantage I see with the no-cloud method is that it opens a few doors that are difficult, at best to do with openstack 16:41:48 mostly graphical tasks 16:42:15 the gnome folks want to run their installed tests on composes and that requires X and starting commands from a graphical session 16:42:37 IIRC, there is already code to run those by sending commands to qemu 16:43:17 so my thinking was that it wouldn't be too bad to integrate that code if the VM is local 16:44:02 but we are just talking about theoreticals at this point 16:44:33 well, they are the easiest to talk about :) 16:44:46 does anyone have an objection to doing a proof-of-concept to see if this approach is viable? 16:44:52 roshi, :) 16:44:59 a timeboxed proof-of-concept 16:45:04 tflink, we should do that. 16:45:20 no objections 16:45:54 none here 16:45:56 * tflink suspects that he or roshi would be the best to do that since we're taking the least amount of PTO before jan 1 16:47:11 unless I'm mixing up somebody's plans 16:47:30 I'm out tomorrow and thursday 16:47:48 will likely be online sometime around lunch on friday - working into the evening 16:47:57 if that helps 16:48:09 * roshi still has to work out a test instance of glance... 16:48:10 if nobody else is volunteering, roshi and I can work it out after the meeting 16:48:36 * roshi tries to think of a way to pull kushal into qa-devel work... 16:48:37 :p 16:48:47 roshi, You are welcome 16:49:17 #info we need a proof-of-concept implementation of the no-cloud approach before making a decision on whether to go forward with it 16:49:43 #action tflink and roshi to figure out who's working on the timeboxed proof-of concept 16:49:50 anything else for this topic today? 16:49:56 * roshi has nothing 16:50:52 jskladan: you up for a bit of discussion around execdb? 16:51:43 he may have disappeared for the day 16:51:50 since we're an hour over :-/ 16:52:08 so, moving on to 16:52:14 #topic Planning - blockerbugs 16:52:47 one thing that we need to figure out is if the desired blockerbugs features are worth taking resources away from taskotron 16:53:06 that's a tough one to weigh 16:53:18 but that's difficult to decide before we know how much time the disposable client bit is going to take 16:53:47 there are some genuine bugs that need to be fixed, but it sounds like those shouldn't take too long 16:54:15 that's good at least 16:54:29 the el7 migration also needs to happen before f22 but it looks like that's a lot of futzing with packages and poking maintainers to fix busted stuff 16:56:21 sounds time consuming 16:56:54 the biggest features are: migrating to the bugzilla-fedmsg gatway instead of syncing every X minutes, enhancing the proposal interface and providing a view for closed blockers 16:58:01 at this point, I'm of the mind to ignore blockerbugs features until we have a clearer picture of what is needed for disposable clients 16:58:15 makes sense to me 16:58:19 deal with the bugfixes, package for el7 and re-visit after the first of the year 16:58:37 any objections? 16:58:46 * roshi has none 16:59:16 though, I might be the only one here :) 16:59:31 other folks are still online :) 16:59:46 worksforme 17:00:49 #info will be ignoring blockerbugs features for now, focusing on bugfixes and el7 packaging. will revisit after the first of the year when we have a better idea of what's needed for taskotron disposable clients 17:01:20 mkrizek: IIRC, you already took the existing blockerbugs issues in phab 17:01:41 I took T380 17:02:13 that's the big one I'm aware of 17:02:45 I found a few more tracebacks in the admin interface though 17:03:00 * mkrizek will be sending patch tomorrow 17:03:11 yeah, i didn't dig very far into the issue - just worked around it with the cli 17:03:19 sounds good 17:04:51 anyhow, we're already an hour over 17:05:07 since we already covered the bodhi2 stuff indirectly, it's time for 17:05:11 #topic Open Floor 17:05:20 any other topics that folks want to bring up? 17:05:27 I don't have anything 17:05:32 nope 17:06:37 ok, thanks for coming everyone. sorry for the late start and the long meeting 17:06:55 enjoy your holidays :) 17:06:57 yeah, no problem 17:07:21 * tflink will send out minutes shortly 17:07:23 #endmeeting