17:01:01 <mattclay> #startmeeting Ansible Testing Working Group
17:01:01 <zodbot> Meeting started Thu May  9 17:01:01 2019 UTC.
17:01:01 <zodbot> This meeting is logged and archived in a public location.
17:01:01 <zodbot> The chair is mattclay. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:01:01 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
17:01:01 <zodbot> The meeting name has been set to 'ansible_testing_working_group'
17:01:05 <mattclay> #info Agenda: https://github.com/ansible/community/issues/248
17:01:10 <mattclay> #chair gundalow pabelanger
17:01:10 <zodbot> Current chairs: gundalow mattclay pabelanger
17:01:20 <mattclay> pabelanger: Do you have any updates on Zuul this week?
17:01:41 <pabelanger> mattclay: yup, I can give latest updates
17:02:34 <pabelanger> the big newest here is we are now running the ansible-test network-integration tests on zuul.ansible.com
17:02:42 <pabelanger> for example: https://dashboard.zuul.ansible.com/t/ansible/builds?job_name=ansible-test-network-integration-eos&job_name=ansible-test-network-integration-vyos are the latest results
17:03:14 <pabelanger> right now, vyos is green on all branches, minus stable-2.8 (we have a PR up to fix)
17:03:21 <pabelanger> and with eos, we are still working though issues
17:03:28 <pabelanger> we expect eos devel to be green today
17:03:46 <pabelanger> these tests are still running post merge
17:03:54 <Goneri> hi!
17:03:55 <pabelanger> so on periodic timers
17:04:38 <pabelanger> but, I think I'd like to discuss adding https://github.com/apps/ansible-zuul to ansible/ansible repo, so we can report back third-party CI results for network-integration specific PRs
17:05:03 <pabelanger> then we can start doing some pre-merge testing for network appliances
17:05:13 <mattclay> pabelanger: Have you looked at what kind of updates will be needed to ansibullbot so it will work correctly with the Zuul results?
17:05:53 <pabelanger> mattclay: I have not
17:06:16 <pabelanger> I guess for that, ansiblebot would some how read the comments of a PR to confirm tests are green before merging?
17:06:46 <mattclay> pabelanger: Will Zuul be reporting CI status, or only making comments?
17:06:49 <pabelanger> my first step was to report the results back, and have humans still valid they were success before clicking merge
17:06:56 <pabelanger> mattclay: comments for now
17:07:07 <mattclay> pabelanger: What's needed to get it to report CI status instead?
17:07:20 <pabelanger> mattclay: that is with the checks api right?
17:07:55 <mattclay> pabelanger: That might be one option -- or you could use the status API (what Shippable, Travis, etc. have been using).
17:08:35 <mattclay> pabelanger: https://developer.github.com/v3/repos/statuses/#create-a-status
17:09:14 <mattclay> That's what Shippable is using, what the bot already understands (at least for Shippable) and it's a much more simple API than the newer Checks API.
17:09:14 <pabelanger> 1 sec, checking right terminology
17:09:47 <mattclay> The newer Checks API is documented here: https://developer.github.com/v3/checks/
17:09:58 <pabelanger> yah
17:10:04 <pabelanger> so we can do the status api
17:10:51 <pabelanger> so it would be something like ansible/thirdparty-check:succes or failed
17:11:10 <pabelanger> and if ansiblebot wants to enfore that for branch protection, we have that today
17:11:41 <pabelanger> https://github.com/ansible/ansible-zuul-jobs/pull/57
17:11:44 <mattclay> Yeah, just the status (pass or fail) and a link back to the results. Then the bot could be updated to take that into consideration for allowing merges of PRs.
17:11:51 <pabelanger> yah
17:11:56 <pabelanger> that's how we do it today
17:12:01 <pabelanger> we have an item to use the newer api
17:12:08 <pabelanger> but that work hasn't started yet
17:12:11 <mattclay> We'd also need to decide when to run the tests using Zuul.
17:12:35 <mattclay> There's no rush on the newer API. We haven't started using it yet and the bot doesn't understand it.
17:12:46 <pabelanger> right, we can configure that on zuul side also, for the pipeline. My first guess is when new PR is created, and updated
17:13:00 <pabelanger> we can also add specific commands, like recheck to trigger adhoc runs
17:13:36 <pabelanger> we'd also filter on when we'd run things
17:13:36 <Pilou> would "build_ci" trigger a new zuul run ?
17:13:53 <pabelanger> so changes to windows modules shouldn't start a network integration run
17:13:55 <mattclay> pabelanger: I was thinking more about deciding which files being changed would trigger the tests. We should probably look at using ansible-test's change classification to determine that, rather than trying to maintain that information in two places.
17:13:56 <pabelanger> Pilou: it could
17:14:11 <pabelanger> mattclay: sure we can do that too
17:14:48 <pabelanger> mattclay: there is a cost if ansible-test manages that, as we'd still launch a node. but I think we could iterate on some of that after we had the github app
17:15:24 <mattclay> When we add commands for re-running jobs (or anything else for a CI provider) we should make sure they're named after the provider so it's clear which one we're acting on (build_zuul vs build_ci for example).
17:15:56 <pabelanger> yes, I've seen that idea before too
17:16:22 <pabelanger> the pipeline on zuul is just regex, so we can match to any arbitrary string
17:16:41 <mattclay> pabelanger: How soon do you think you'll be ready to start looking at running the tests on PRs and updating ansibullbot?
17:17:16 <pabelanger> mattclay: from a testing POV, we are ready now. we'd be able to report results back on vyos / eos. other images are to follow in coming days.
17:17:39 <pabelanger> I can also work on ansibullbot integration for the check status, once we figure out the name and I learn how the code work
17:18:06 <pabelanger> so, I think we can enable the github app any time, then slowly roll up some initial testing on a PR or 2
17:18:14 <pabelanger> just to make sure we don't overload zuul.ansible.com
17:18:20 <pabelanger> (I don't think we will)
17:19:14 <Pilou> (shippable status is checked here: ansibullbot/triagers/plugins/needs_revision.py)
17:19:23 <pabelanger> k
17:19:33 <pabelanger> I'll start looking into that this evening
17:19:35 <mattclay> pabelanger: OK, we can discuss next steps outside of this meeting. Maybe tomorrow or Monday, since I have 2.8 release stuff to handle today.
17:19:47 <pabelanger> mattclay: great, efm
17:19:49 <pabelanger> wfm*
17:20:00 <mattclay> jillr Goneri: Do you have any updates for us on vmware testing?
17:20:31 <Goneri> Hi! So we have a large PR on ansible-core-ci waiting to be reviewed/merged
17:21:17 <Goneri> it's out first milestone and give the ability to use the ansible CI workflow with vmware ESXi
17:21:25 <mattclay> I should be able to take a closer look at that soon, now that I'm back from PyCon.
17:21:27 <Goneri> jillr did a demo here: http://demo.jillr.io/asciinema/worldstream/worldstream_demo.html
17:21:59 <Goneri> currently, it only starts one ESXi which is enough for a proof of concept but won't be enough to run the full test-suite.
17:22:40 <Goneri> on my side, I've been working on vcenter deployment, we can bootstrap a vcenter automagically with a new Ansible role: https://github.com/goneri/ansible-role-vcenter-instance
17:23:26 <Goneri> and this role is integrated in the Ansible CI. So basically, we can prepare vcenter template automatically (21 minutes in our env)
17:23:52 <Goneri> I'm now working on top of jillr PR to add the ability to start a vcenter in addition to the esxi
17:24:01 <mattclay> Nice. Automatic template generation will save us a lot of time and help keep them updated.
17:24:02 <pabelanger> Goneri: is this for vmware integration testing or just to run ansible-test on another provider?
17:24:17 <Goneri> and in parallel, I add the ability to start up 2 extra ESXi, we need that for some tests.
17:24:49 <Goneri> for the record, I also maintain this thing: https://github.com/goneri/vmware-on-libvirt
17:25:15 <Goneri> it's some kind of copy of our CI on libvirt, it will be handy for those who want to reproduce our test env on libvirt
17:25:43 <Goneri> pabelanger, this is to run the vmware integration test-suite
17:26:14 <pabelanger> Goneri: neat, just looking at ESXi ISO image now
17:26:24 <Goneri> the test-suite requires a vcenter and up to 3 ESXi (HA)
17:26:44 <pabelanger> ack
17:26:52 <Goneri> pabelanger, what do you want to do?
17:27:08 <pabelanger> Goneri: nothing, was just trying to understand use case
17:27:31 <pabelanger> wasn't sure the requirement for vmware itegration
17:27:54 <mattclay> Goneri: What are the advantages and disadvantages of using libvirt instead of esxi for virtualizing the test infrastructure?
17:27:58 <pabelanger> is that running in our lab some place?
17:28:31 <pabelanger> mattclay: that was going to be my next question :)
17:28:33 <Goneri> mattclay, this way, i can run the test-suite on my laptop. It was really handy at the beginning.
17:28:44 <Goneri> and I still use that a lot actually.
17:29:30 <pabelanger> Goneri: how long does ansible-test integration run on libvirt?
17:30:10 <Goneri> good question, it rather long, I would say like 1h.
17:30:13 <pabelanger> cause, my gut is saying, if it works well on libvirt, imagine if we could also boot them via nodepool
17:30:33 <pabelanger> that's the approach we are taking for network-integration tests bascially
17:31:15 <Goneri> I did this: https://github.com/goneri/esxi-cloud-init/ it's limited cloud-init script that support OpenStack meta-data
17:31:16 <pabelanger> mattclay: so, to answer your pros / cons question, the pro for using libvirt would be more parallel jobs running.
17:31:31 <pabelanger> since we are not limited to single esxi cluster
17:31:44 <Goneri> and, if you want to investigate that, you can build images with: https://github.com/virt-lightning/esxi-cloud-images
17:32:01 <pabelanger> cool, I'll poke into it when I find some time
17:32:15 <pabelanger> we also had to do some crazy things to get network vendor images to boot
17:32:47 <mattclay> pabelanger: We can run more parallel jobs if we have adequate hardware, so I don't really consider that to be libvirt specific.
17:33:17 <pabelanger> right, with libvirt could give you some more isolation too
17:33:24 <pabelanger> but agree
17:33:37 <mattclay> How so? We're running esxi under esxi already...
17:34:16 <pabelanger> where is the esxi cluster today? I'm not really up to speed on it
17:35:15 <pabelanger> my thought is, if libvirt is used, then a VM could be booted any place, azure, goggle, aws, openstack, which give greater capacity then current esxi I imagine
17:35:33 <mattclay> pabelanger: We're currently running one at worldstream for our proof-of-concept, but will add/upgrade servers as needed.
17:35:35 <pabelanger> but totally understand there is overhead to that
17:35:44 <Goneri> it's a single esxi server in a DC
17:35:50 <pabelanger> okay
17:36:54 <mattclay> If the performance and features supported are comparable to running under esxi then it's certainly something to consider. I did the initial poc with nested esxi since it was known to work, even though it's not officially supported.
17:38:30 <pabelanger> yah, I'm kinda interested in this topic, but have too much on my plate to get too involved :)
17:38:56 <pabelanger> there could be a good integration story with nodepool, but also sounds like current approach totally works
17:39:49 <mattclay> Goneri: Do you have anything else for us?
17:39:54 <Goneri> one of the problem that we face is the performance, it takes several minutes to get the lab ready, before we can do our first test.
17:40:17 <Goneri> and nodepool, or a similar mechanism can be a good solution.
17:40:27 <Goneri> just my 2c
17:40:45 <Goneri> mattclay, no, i'm done :-0
17:40:49 <Goneri> :-)
17:41:05 <mattclay> Goneri: Thanks for the update.
17:41:27 <mattclay> Is there anything else anyone would like to discuss?
17:41:32 <gundalow> pabelanger: speak to jlk about GitHub.py and reporting results to Checks API. he was working on something
17:42:06 <pabelanger> gundalow: yup, I think that work is pretty much done
17:42:12 <pabelanger> we just need to do design spec for zuul
17:42:18 <gundalow> oh, ace
17:42:27 <pabelanger> we (ansible) and bwm are interested in the api
17:42:38 <pabelanger> so I imagine we can work on something soonish for it
17:42:56 <gundalow> Checks API will be brilliant for when we have CI failures due to specific lines in the PR
17:44:33 <pabelanger> ++
17:44:43 <pabelanger> mattclay: nothing from me
17:46:24 <gundalow> pabelanger: for rebuild commands please see https://github.com/ansible/ansibullbot/issues/1161
17:46:33 <gundalow> (hum, may have mentioned that already, sorry)
17:47:15 <gundalow> mattclay: so would we want
17:47:15 <gundalow> rerun_shippable - re-run only failed Shippable jobs (ignored if no failures)
17:47:15 <gundalow> rerun_shippable_all - re-run all Shippable jobs
17:47:15 <gundalow> rerun_zuul_openstack
17:47:15 <gundalow> rerun_zuul_network
17:47:24 <pabelanger> cool
17:47:36 <pabelanger> then we can add them to pipeline stanza today for zuul.a.c
17:48:12 <mattclay> Does Zuul support re-running only failed jobs?
17:48:40 <gundalow> hum, not sure if it would be called `zuul_network` or `zuul_ansible`
17:49:30 <pabelanger> mattclay: no, the complete buildset would be run again today
17:49:43 <gundalow> Since we are triggering the specific instance of GitHub App
17:49:50 <pabelanger> there has been some discussion to target specific jobs, but that still needs some discussion upstream
17:49:52 <gundalow> ok, so `_all` on those
17:51:27 <pabelanger> right, in the case of rerun, it usually is 2 things, poor infra or flaky testing.  History has taught me, when projects run multiple recheck / reruns of tests, to merge stuff. It just makes testing much harder to pass
17:51:42 <pabelanger> so, I tend to use them less myself
17:51:51 <pabelanger> but also tend to dive into fixing random tests too
17:56:19 <mattclay> Thanks for coming everyone.
17:56:23 <mattclay> #endmeeting