17:01:01 <mattclay> #startmeeting Ansible Testing Working Group 17:01:01 <zodbot> Meeting started Thu May 9 17:01:01 2019 UTC. 17:01:01 <zodbot> This meeting is logged and archived in a public location. 17:01:01 <zodbot> The chair is mattclay. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:01 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic. 17:01:01 <zodbot> The meeting name has been set to 'ansible_testing_working_group' 17:01:05 <mattclay> #info Agenda: https://github.com/ansible/community/issues/248 17:01:10 <mattclay> #chair gundalow pabelanger 17:01:10 <zodbot> Current chairs: gundalow mattclay pabelanger 17:01:20 <mattclay> pabelanger: Do you have any updates on Zuul this week? 17:01:41 <pabelanger> mattclay: yup, I can give latest updates 17:02:34 <pabelanger> the big newest here is we are now running the ansible-test network-integration tests on zuul.ansible.com 17:02:42 <pabelanger> for example: https://dashboard.zuul.ansible.com/t/ansible/builds?job_name=ansible-test-network-integration-eos&job_name=ansible-test-network-integration-vyos are the latest results 17:03:14 <pabelanger> right now, vyos is green on all branches, minus stable-2.8 (we have a PR up to fix) 17:03:21 <pabelanger> and with eos, we are still working though issues 17:03:28 <pabelanger> we expect eos devel to be green today 17:03:46 <pabelanger> these tests are still running post merge 17:03:54 <Goneri> hi! 17:03:55 <pabelanger> so on periodic timers 17:04:38 <pabelanger> but, I think I'd like to discuss adding https://github.com/apps/ansible-zuul to ansible/ansible repo, so we can report back third-party CI results for network-integration specific PRs 17:05:03 <pabelanger> then we can start doing some pre-merge testing for network appliances 17:05:13 <mattclay> pabelanger: Have you looked at what kind of updates will be needed to ansibullbot so it will work correctly with the Zuul results? 17:05:53 <pabelanger> mattclay: I have not 17:06:16 <pabelanger> I guess for that, ansiblebot would some how read the comments of a PR to confirm tests are green before merging? 17:06:46 <mattclay> pabelanger: Will Zuul be reporting CI status, or only making comments? 17:06:49 <pabelanger> my first step was to report the results back, and have humans still valid they were success before clicking merge 17:06:56 <pabelanger> mattclay: comments for now 17:07:07 <mattclay> pabelanger: What's needed to get it to report CI status instead? 17:07:20 <pabelanger> mattclay: that is with the checks api right? 17:07:55 <mattclay> pabelanger: That might be one option -- or you could use the status API (what Shippable, Travis, etc. have been using). 17:08:35 <mattclay> pabelanger: https://developer.github.com/v3/repos/statuses/#create-a-status 17:09:14 <mattclay> That's what Shippable is using, what the bot already understands (at least for Shippable) and it's a much more simple API than the newer Checks API. 17:09:14 <pabelanger> 1 sec, checking right terminology 17:09:47 <mattclay> The newer Checks API is documented here: https://developer.github.com/v3/checks/ 17:09:58 <pabelanger> yah 17:10:04 <pabelanger> so we can do the status api 17:10:51 <pabelanger> so it would be something like ansible/thirdparty-check:succes or failed 17:11:10 <pabelanger> and if ansiblebot wants to enfore that for branch protection, we have that today 17:11:41 <pabelanger> https://github.com/ansible/ansible-zuul-jobs/pull/57 17:11:44 <mattclay> Yeah, just the status (pass or fail) and a link back to the results. Then the bot could be updated to take that into consideration for allowing merges of PRs. 17:11:51 <pabelanger> yah 17:11:56 <pabelanger> that's how we do it today 17:12:01 <pabelanger> we have an item to use the newer api 17:12:08 <pabelanger> but that work hasn't started yet 17:12:11 <mattclay> We'd also need to decide when to run the tests using Zuul. 17:12:35 <mattclay> There's no rush on the newer API. We haven't started using it yet and the bot doesn't understand it. 17:12:46 <pabelanger> right, we can configure that on zuul side also, for the pipeline. My first guess is when new PR is created, and updated 17:13:00 <pabelanger> we can also add specific commands, like recheck to trigger adhoc runs 17:13:36 <pabelanger> we'd also filter on when we'd run things 17:13:36 <Pilou> would "build_ci" trigger a new zuul run ? 17:13:53 <pabelanger> so changes to windows modules shouldn't start a network integration run 17:13:55 <mattclay> pabelanger: I was thinking more about deciding which files being changed would trigger the tests. We should probably look at using ansible-test's change classification to determine that, rather than trying to maintain that information in two places. 17:13:56 <pabelanger> Pilou: it could 17:14:11 <pabelanger> mattclay: sure we can do that too 17:14:48 <pabelanger> mattclay: there is a cost if ansible-test manages that, as we'd still launch a node. but I think we could iterate on some of that after we had the github app 17:15:24 <mattclay> When we add commands for re-running jobs (or anything else for a CI provider) we should make sure they're named after the provider so it's clear which one we're acting on (build_zuul vs build_ci for example). 17:15:56 <pabelanger> yes, I've seen that idea before too 17:16:22 <pabelanger> the pipeline on zuul is just regex, so we can match to any arbitrary string 17:16:41 <mattclay> pabelanger: How soon do you think you'll be ready to start looking at running the tests on PRs and updating ansibullbot? 17:17:16 <pabelanger> mattclay: from a testing POV, we are ready now. we'd be able to report results back on vyos / eos. other images are to follow in coming days. 17:17:39 <pabelanger> I can also work on ansibullbot integration for the check status, once we figure out the name and I learn how the code work 17:18:06 <pabelanger> so, I think we can enable the github app any time, then slowly roll up some initial testing on a PR or 2 17:18:14 <pabelanger> just to make sure we don't overload zuul.ansible.com 17:18:20 <pabelanger> (I don't think we will) 17:19:14 <Pilou> (shippable status is checked here: ansibullbot/triagers/plugins/needs_revision.py) 17:19:23 <pabelanger> k 17:19:33 <pabelanger> I'll start looking into that this evening 17:19:35 <mattclay> pabelanger: OK, we can discuss next steps outside of this meeting. Maybe tomorrow or Monday, since I have 2.8 release stuff to handle today. 17:19:47 <pabelanger> mattclay: great, efm 17:19:49 <pabelanger> wfm* 17:20:00 <mattclay> jillr Goneri: Do you have any updates for us on vmware testing? 17:20:31 <Goneri> Hi! So we have a large PR on ansible-core-ci waiting to be reviewed/merged 17:21:17 <Goneri> it's out first milestone and give the ability to use the ansible CI workflow with vmware ESXi 17:21:25 <mattclay> I should be able to take a closer look at that soon, now that I'm back from PyCon. 17:21:27 <Goneri> jillr did a demo here: http://demo.jillr.io/asciinema/worldstream/worldstream_demo.html 17:21:59 <Goneri> currently, it only starts one ESXi which is enough for a proof of concept but won't be enough to run the full test-suite. 17:22:40 <Goneri> on my side, I've been working on vcenter deployment, we can bootstrap a vcenter automagically with a new Ansible role: https://github.com/goneri/ansible-role-vcenter-instance 17:23:26 <Goneri> and this role is integrated in the Ansible CI. So basically, we can prepare vcenter template automatically (21 minutes in our env) 17:23:52 <Goneri> I'm now working on top of jillr PR to add the ability to start a vcenter in addition to the esxi 17:24:01 <mattclay> Nice. Automatic template generation will save us a lot of time and help keep them updated. 17:24:02 <pabelanger> Goneri: is this for vmware integration testing or just to run ansible-test on another provider? 17:24:17 <Goneri> and in parallel, I add the ability to start up 2 extra ESXi, we need that for some tests. 17:24:49 <Goneri> for the record, I also maintain this thing: https://github.com/goneri/vmware-on-libvirt 17:25:15 <Goneri> it's some kind of copy of our CI on libvirt, it will be handy for those who want to reproduce our test env on libvirt 17:25:43 <Goneri> pabelanger, this is to run the vmware integration test-suite 17:26:14 <pabelanger> Goneri: neat, just looking at ESXi ISO image now 17:26:24 <Goneri> the test-suite requires a vcenter and up to 3 ESXi (HA) 17:26:44 <pabelanger> ack 17:26:52 <Goneri> pabelanger, what do you want to do? 17:27:08 <pabelanger> Goneri: nothing, was just trying to understand use case 17:27:31 <pabelanger> wasn't sure the requirement for vmware itegration 17:27:54 <mattclay> Goneri: What are the advantages and disadvantages of using libvirt instead of esxi for virtualizing the test infrastructure? 17:27:58 <pabelanger> is that running in our lab some place? 17:28:31 <pabelanger> mattclay: that was going to be my next question :) 17:28:33 <Goneri> mattclay, this way, i can run the test-suite on my laptop. It was really handy at the beginning. 17:28:44 <Goneri> and I still use that a lot actually. 17:29:30 <pabelanger> Goneri: how long does ansible-test integration run on libvirt? 17:30:10 <Goneri> good question, it rather long, I would say like 1h. 17:30:13 <pabelanger> cause, my gut is saying, if it works well on libvirt, imagine if we could also boot them via nodepool 17:30:33 <pabelanger> that's the approach we are taking for network-integration tests bascially 17:31:15 <Goneri> I did this: https://github.com/goneri/esxi-cloud-init/ it's limited cloud-init script that support OpenStack meta-data 17:31:16 <pabelanger> mattclay: so, to answer your pros / cons question, the pro for using libvirt would be more parallel jobs running. 17:31:31 <pabelanger> since we are not limited to single esxi cluster 17:31:44 <Goneri> and, if you want to investigate that, you can build images with: https://github.com/virt-lightning/esxi-cloud-images 17:32:01 <pabelanger> cool, I'll poke into it when I find some time 17:32:15 <pabelanger> we also had to do some crazy things to get network vendor images to boot 17:32:47 <mattclay> pabelanger: We can run more parallel jobs if we have adequate hardware, so I don't really consider that to be libvirt specific. 17:33:17 <pabelanger> right, with libvirt could give you some more isolation too 17:33:24 <pabelanger> but agree 17:33:37 <mattclay> How so? We're running esxi under esxi already... 17:34:16 <pabelanger> where is the esxi cluster today? I'm not really up to speed on it 17:35:15 <pabelanger> my thought is, if libvirt is used, then a VM could be booted any place, azure, goggle, aws, openstack, which give greater capacity then current esxi I imagine 17:35:33 <mattclay> pabelanger: We're currently running one at worldstream for our proof-of-concept, but will add/upgrade servers as needed. 17:35:35 <pabelanger> but totally understand there is overhead to that 17:35:44 <Goneri> it's a single esxi server in a DC 17:35:50 <pabelanger> okay 17:36:54 <mattclay> If the performance and features supported are comparable to running under esxi then it's certainly something to consider. I did the initial poc with nested esxi since it was known to work, even though it's not officially supported. 17:38:30 <pabelanger> yah, I'm kinda interested in this topic, but have too much on my plate to get too involved :) 17:38:56 <pabelanger> there could be a good integration story with nodepool, but also sounds like current approach totally works 17:39:49 <mattclay> Goneri: Do you have anything else for us? 17:39:54 <Goneri> one of the problem that we face is the performance, it takes several minutes to get the lab ready, before we can do our first test. 17:40:17 <Goneri> and nodepool, or a similar mechanism can be a good solution. 17:40:27 <Goneri> just my 2c 17:40:45 <Goneri> mattclay, no, i'm done :-0 17:40:49 <Goneri> :-) 17:41:05 <mattclay> Goneri: Thanks for the update. 17:41:27 <mattclay> Is there anything else anyone would like to discuss? 17:41:32 <gundalow> pabelanger: speak to jlk about GitHub.py and reporting results to Checks API. he was working on something 17:42:06 <pabelanger> gundalow: yup, I think that work is pretty much done 17:42:12 <pabelanger> we just need to do design spec for zuul 17:42:18 <gundalow> oh, ace 17:42:27 <pabelanger> we (ansible) and bwm are interested in the api 17:42:38 <pabelanger> so I imagine we can work on something soonish for it 17:42:56 <gundalow> Checks API will be brilliant for when we have CI failures due to specific lines in the PR 17:44:33 <pabelanger> ++ 17:44:43 <pabelanger> mattclay: nothing from me 17:46:24 <gundalow> pabelanger: for rebuild commands please see https://github.com/ansible/ansibullbot/issues/1161 17:46:33 <gundalow> (hum, may have mentioned that already, sorry) 17:47:15 <gundalow> mattclay: so would we want 17:47:15 <gundalow> rerun_shippable - re-run only failed Shippable jobs (ignored if no failures) 17:47:15 <gundalow> rerun_shippable_all - re-run all Shippable jobs 17:47:15 <gundalow> rerun_zuul_openstack 17:47:15 <gundalow> rerun_zuul_network 17:47:24 <pabelanger> cool 17:47:36 <pabelanger> then we can add them to pipeline stanza today for zuul.a.c 17:48:12 <mattclay> Does Zuul support re-running only failed jobs? 17:48:40 <gundalow> hum, not sure if it would be called `zuul_network` or `zuul_ansible` 17:49:30 <pabelanger> mattclay: no, the complete buildset would be run again today 17:49:43 <gundalow> Since we are triggering the specific instance of GitHub App 17:49:50 <pabelanger> there has been some discussion to target specific jobs, but that still needs some discussion upstream 17:49:52 <gundalow> ok, so `_all` on those 17:51:27 <pabelanger> right, in the case of rerun, it usually is 2 things, poor infra or flaky testing. History has taught me, when projects run multiple recheck / reruns of tests, to merge stuff. It just makes testing much harder to pass 17:51:42 <pabelanger> so, I tend to use them less myself 17:51:51 <pabelanger> but also tend to dive into fixing random tests too 17:56:19 <mattclay> Thanks for coming everyone. 17:56:23 <mattclay> #endmeeting