17:01:01 #startmeeting Ansible Testing Working Group 17:01:01 Meeting started Thu May 9 17:01:01 2019 UTC. 17:01:01 This meeting is logged and archived in a public location. 17:01:01 The chair is mattclay. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:01 Useful Commands: #action #agreed #halp #info #idea #link #topic. 17:01:01 The meeting name has been set to 'ansible_testing_working_group' 17:01:05 #info Agenda: https://github.com/ansible/community/issues/248 17:01:10 #chair gundalow pabelanger 17:01:10 Current chairs: gundalow mattclay pabelanger 17:01:20 pabelanger: Do you have any updates on Zuul this week? 17:01:41 mattclay: yup, I can give latest updates 17:02:34 the big newest here is we are now running the ansible-test network-integration tests on zuul.ansible.com 17:02:42 for example: https://dashboard.zuul.ansible.com/t/ansible/builds?job_name=ansible-test-network-integration-eos&job_name=ansible-test-network-integration-vyos are the latest results 17:03:14 right now, vyos is green on all branches, minus stable-2.8 (we have a PR up to fix) 17:03:21 and with eos, we are still working though issues 17:03:28 we expect eos devel to be green today 17:03:46 these tests are still running post merge 17:03:54 hi! 17:03:55 so on periodic timers 17:04:38 but, I think I'd like to discuss adding https://github.com/apps/ansible-zuul to ansible/ansible repo, so we can report back third-party CI results for network-integration specific PRs 17:05:03 then we can start doing some pre-merge testing for network appliances 17:05:13 pabelanger: Have you looked at what kind of updates will be needed to ansibullbot so it will work correctly with the Zuul results? 17:05:53 mattclay: I have not 17:06:16 I guess for that, ansiblebot would some how read the comments of a PR to confirm tests are green before merging? 17:06:46 pabelanger: Will Zuul be reporting CI status, or only making comments? 17:06:49 my first step was to report the results back, and have humans still valid they were success before clicking merge 17:06:56 mattclay: comments for now 17:07:07 pabelanger: What's needed to get it to report CI status instead? 17:07:20 mattclay: that is with the checks api right? 17:07:55 pabelanger: That might be one option -- or you could use the status API (what Shippable, Travis, etc. have been using). 17:08:35 pabelanger: https://developer.github.com/v3/repos/statuses/#create-a-status 17:09:14 That's what Shippable is using, what the bot already understands (at least for Shippable) and it's a much more simple API than the newer Checks API. 17:09:14 1 sec, checking right terminology 17:09:47 The newer Checks API is documented here: https://developer.github.com/v3/checks/ 17:09:58 yah 17:10:04 so we can do the status api 17:10:51 so it would be something like ansible/thirdparty-check:succes or failed 17:11:10 and if ansiblebot wants to enfore that for branch protection, we have that today 17:11:41 https://github.com/ansible/ansible-zuul-jobs/pull/57 17:11:44 Yeah, just the status (pass or fail) and a link back to the results. Then the bot could be updated to take that into consideration for allowing merges of PRs. 17:11:51 yah 17:11:56 that's how we do it today 17:12:01 we have an item to use the newer api 17:12:08 but that work hasn't started yet 17:12:11 We'd also need to decide when to run the tests using Zuul. 17:12:35 There's no rush on the newer API. We haven't started using it yet and the bot doesn't understand it. 17:12:46 right, we can configure that on zuul side also, for the pipeline. My first guess is when new PR is created, and updated 17:13:00 we can also add specific commands, like recheck to trigger adhoc runs 17:13:36 we'd also filter on when we'd run things 17:13:36 would "build_ci" trigger a new zuul run ? 17:13:53 so changes to windows modules shouldn't start a network integration run 17:13:55 pabelanger: I was thinking more about deciding which files being changed would trigger the tests. We should probably look at using ansible-test's change classification to determine that, rather than trying to maintain that information in two places. 17:13:56 Pilou: it could 17:14:11 mattclay: sure we can do that too 17:14:48 mattclay: there is a cost if ansible-test manages that, as we'd still launch a node. but I think we could iterate on some of that after we had the github app 17:15:24 When we add commands for re-running jobs (or anything else for a CI provider) we should make sure they're named after the provider so it's clear which one we're acting on (build_zuul vs build_ci for example). 17:15:56 yes, I've seen that idea before too 17:16:22 the pipeline on zuul is just regex, so we can match to any arbitrary string 17:16:41 pabelanger: How soon do you think you'll be ready to start looking at running the tests on PRs and updating ansibullbot? 17:17:16 mattclay: from a testing POV, we are ready now. we'd be able to report results back on vyos / eos. other images are to follow in coming days. 17:17:39 I can also work on ansibullbot integration for the check status, once we figure out the name and I learn how the code work 17:18:06 so, I think we can enable the github app any time, then slowly roll up some initial testing on a PR or 2 17:18:14 just to make sure we don't overload zuul.ansible.com 17:18:20 (I don't think we will) 17:19:14 (shippable status is checked here: ansibullbot/triagers/plugins/needs_revision.py) 17:19:23 k 17:19:33 I'll start looking into that this evening 17:19:35 pabelanger: OK, we can discuss next steps outside of this meeting. Maybe tomorrow or Monday, since I have 2.8 release stuff to handle today. 17:19:47 mattclay: great, efm 17:19:49 wfm* 17:20:00 jillr Goneri: Do you have any updates for us on vmware testing? 17:20:31 Hi! So we have a large PR on ansible-core-ci waiting to be reviewed/merged 17:21:17 it's out first milestone and give the ability to use the ansible CI workflow with vmware ESXi 17:21:25 I should be able to take a closer look at that soon, now that I'm back from PyCon. 17:21:27 jillr did a demo here: http://demo.jillr.io/asciinema/worldstream/worldstream_demo.html 17:21:59 currently, it only starts one ESXi which is enough for a proof of concept but won't be enough to run the full test-suite. 17:22:40 on my side, I've been working on vcenter deployment, we can bootstrap a vcenter automagically with a new Ansible role: https://github.com/goneri/ansible-role-vcenter-instance 17:23:26 and this role is integrated in the Ansible CI. So basically, we can prepare vcenter template automatically (21 minutes in our env) 17:23:52 I'm now working on top of jillr PR to add the ability to start a vcenter in addition to the esxi 17:24:01 Nice. Automatic template generation will save us a lot of time and help keep them updated. 17:24:02 Goneri: is this for vmware integration testing or just to run ansible-test on another provider? 17:24:17 and in parallel, I add the ability to start up 2 extra ESXi, we need that for some tests. 17:24:49 for the record, I also maintain this thing: https://github.com/goneri/vmware-on-libvirt 17:25:15 it's some kind of copy of our CI on libvirt, it will be handy for those who want to reproduce our test env on libvirt 17:25:43 pabelanger, this is to run the vmware integration test-suite 17:26:14 Goneri: neat, just looking at ESXi ISO image now 17:26:24 the test-suite requires a vcenter and up to 3 ESXi (HA) 17:26:44 ack 17:26:52 pabelanger, what do you want to do? 17:27:08 Goneri: nothing, was just trying to understand use case 17:27:31 wasn't sure the requirement for vmware itegration 17:27:54 Goneri: What are the advantages and disadvantages of using libvirt instead of esxi for virtualizing the test infrastructure? 17:27:58 is that running in our lab some place? 17:28:31 mattclay: that was going to be my next question :) 17:28:33 mattclay, this way, i can run the test-suite on my laptop. It was really handy at the beginning. 17:28:44 and I still use that a lot actually. 17:29:30 Goneri: how long does ansible-test integration run on libvirt? 17:30:10 good question, it rather long, I would say like 1h. 17:30:13 cause, my gut is saying, if it works well on libvirt, imagine if we could also boot them via nodepool 17:30:33 that's the approach we are taking for network-integration tests bascially 17:31:15 I did this: https://github.com/goneri/esxi-cloud-init/ it's limited cloud-init script that support OpenStack meta-data 17:31:16 mattclay: so, to answer your pros / cons question, the pro for using libvirt would be more parallel jobs running. 17:31:31 since we are not limited to single esxi cluster 17:31:44 and, if you want to investigate that, you can build images with: https://github.com/virt-lightning/esxi-cloud-images 17:32:01 cool, I'll poke into it when I find some time 17:32:15 we also had to do some crazy things to get network vendor images to boot 17:32:47 pabelanger: We can run more parallel jobs if we have adequate hardware, so I don't really consider that to be libvirt specific. 17:33:17 right, with libvirt could give you some more isolation too 17:33:24 but agree 17:33:37 How so? We're running esxi under esxi already... 17:34:16 where is the esxi cluster today? I'm not really up to speed on it 17:35:15 my thought is, if libvirt is used, then a VM could be booted any place, azure, goggle, aws, openstack, which give greater capacity then current esxi I imagine 17:35:33 pabelanger: We're currently running one at worldstream for our proof-of-concept, but will add/upgrade servers as needed. 17:35:35 but totally understand there is overhead to that 17:35:44 it's a single esxi server in a DC 17:35:50 okay 17:36:54 If the performance and features supported are comparable to running under esxi then it's certainly something to consider. I did the initial poc with nested esxi since it was known to work, even though it's not officially supported. 17:38:30 yah, I'm kinda interested in this topic, but have too much on my plate to get too involved :) 17:38:56 there could be a good integration story with nodepool, but also sounds like current approach totally works 17:39:49 Goneri: Do you have anything else for us? 17:39:54 one of the problem that we face is the performance, it takes several minutes to get the lab ready, before we can do our first test. 17:40:17 and nodepool, or a similar mechanism can be a good solution. 17:40:27 just my 2c 17:40:45 mattclay, no, i'm done :-0 17:40:49 :-) 17:41:05 Goneri: Thanks for the update. 17:41:27 Is there anything else anyone would like to discuss? 17:41:32 pabelanger: speak to jlk about GitHub.py and reporting results to Checks API. he was working on something 17:42:06 gundalow: yup, I think that work is pretty much done 17:42:12 we just need to do design spec for zuul 17:42:18 oh, ace 17:42:27 we (ansible) and bwm are interested in the api 17:42:38 so I imagine we can work on something soonish for it 17:42:56 Checks API will be brilliant for when we have CI failures due to specific lines in the PR 17:44:33 ++ 17:44:43 mattclay: nothing from me 17:46:24 pabelanger: for rebuild commands please see https://github.com/ansible/ansibullbot/issues/1161 17:46:33 (hum, may have mentioned that already, sorry) 17:47:15 mattclay: so would we want 17:47:15 rerun_shippable - re-run only failed Shippable jobs (ignored if no failures) 17:47:15 rerun_shippable_all - re-run all Shippable jobs 17:47:15 rerun_zuul_openstack 17:47:15 rerun_zuul_network 17:47:24 cool 17:47:36 then we can add them to pipeline stanza today for zuul.a.c 17:48:12 Does Zuul support re-running only failed jobs? 17:48:40 hum, not sure if it would be called `zuul_network` or `zuul_ansible` 17:49:30 mattclay: no, the complete buildset would be run again today 17:49:43 Since we are triggering the specific instance of GitHub App 17:49:50 there has been some discussion to target specific jobs, but that still needs some discussion upstream 17:49:52 ok, so `_all` on those 17:51:27 right, in the case of rerun, it usually is 2 things, poor infra or flaky testing. History has taught me, when projects run multiple recheck / reruns of tests, to merge stuff. It just makes testing much harder to pass 17:51:42 so, I tend to use them less myself 17:51:51 but also tend to dive into fixing random tests too 17:56:19 Thanks for coming everyone. 17:56:23 #endmeeting