13:53:44 #startmeeting AnsibleFest Developer Conference - Zuul 13:53:44 Meeting started Wed Jun 21 13:53:44 2017 UTC. The chair is jimi|ansible. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:53:44 Useful Commands: #action #agreed #halp #info #idea #link #topic. 13:53:44 The meeting name has been set to 'ansiblefest_developer_conference_-_zuul' 13:54:17 #chairs jimi|ansible mordred thaumos gundalow 13:54:23 #chair jimi|ansible mordred thaumos gundalow 13:54:23 Current chairs: gundalow jimi|ansible mordred thaumos 13:54:35 ^ if anyone else needs it ping me 13:55:24 #topic Zuul deep dive 13:56:16 #link https://public.etherpad-mozilla.org/p/ansible-summit-june-2017-Zuul 13:56:33 #info zuul components - scheduler / nodepool / executors / nodes 13:56:57 i'm listening 13:56:59 #info content is normally run on nodes, not executors 13:57:07 #chair jtanner 13:57:07 Current chairs: gundalow jimi|ansible jtanner mordred thaumos 13:57:14 #chair samdoran 13:57:14 Current chairs: gundalow jimi|ansible jtanner mordred samdoran thaumos 13:57:36 #info jobs can run in two modes: trusted and untrusted 13:57:40 Any idea if they are going to announce open tower tomorrow at ansiblefest? 13:57:42 For people following remotely https://bluejeans.com/2413805790 13:59:46 P-NuT: i doubt anyone knows that now, if it's true 14:00:46 #info there's a github app to add the zuul integration 14:01:54 Out of interest who is following this remotely? 14:02:06 #info lots of things can trigger zuul jobs - commits, comments, etc 14:03:03 gundalow I'm following remotely 14:03:04 /me hopes that mattclay is following remotely :-) 14:03:17 #info jobs definitions are stored in the repo, and can be run via any instance of zuul 14:03:41 shertel: :) 14:04:53 #info design philosophy is that you should be able to configure so that only the system merges, not people. 14:05:19 #info a combination of human approval and tests passing result in changes merging. 14:06:05 The more that we can configure into zuul wrt automerge is probably better. 14:06:13 gundalow: i am following, but mostly out of curiosity to see if mordred can limit himself to an hour 14:06:25 Shrews: haha 14:06:34 So we can make a "gate" requirement be a review specifically from the ansibot user. 14:06:38 Shrews: Which bets are you covering? 14:06:40 ;-) 14:06:53 hah 14:07:23 ~ +20mins .. as long as no one points it out 14:07:31 ^ down for $20 14:08:42 jlk: what happens when a trigger fails to be sent? 14:08:45 i didn't hear why my name was mentioned 14:09:12 jtanner: mckerr was asking who on the zuul side he should make sure was talking with you 14:09:18 abadger1999: well, if zuul misses an event that github sends, it obviously doesn't do the work 14:09:23 About how ansibullbot would integrate with zuul 14:09:33 jlk: k. So there's no way that it catches up? 14:09:37 abadger1999: but, we have that comment trigger of "recheck" so that any human can kick it into responding 14:09:40 #action McKerr to get Jessy and jtanner to speak to each other 14:09:46 jtanner: tentative conclusion was that your counterpart is jlk. 14:09:50 abadger1999: not really. Github is a try once and fail. 14:10:04 oh we've spoken numerous times about github things :D 14:10:35 jlk: Yeah ... I believe tanner has to do a bunch of polling to make sure that we catch up on missed events. 14:10:44 abadger1999: the model here is to allow "potential" trigger by many things. Comment, status, review, etc.. But also have a pipeline requirement that certain things are in place. 14:10:50 if zuul remains 100% event/hook based, it'll have many of hte same problems the other ci system have 14:10:59 hooks don't fire or get lost A LOT 14:11:04 and if something systemic goes wrong, there are admin commands to out-of-band trigger events; so you could retrigger all open prs or something... 14:11:15 so if Zuul was blocked by having a positive review, and we miss the positive review event, a comment can cause zuul to re-evaluate the change and potentially let it in 14:11:50 as long as zuul triggers on comments and other things, it shouldn't be a problem to retrigger jobs on an as-needed basis 14:11:59 so it must do some kind of polling? 14:12:03 ^ jlk? 14:12:07 doesn't poll 14:12:11 a comment generates an event 14:12:25 ahh, didn't realize comments generated events 14:12:32 #info migration proposal: Step 1: have openstack zuul trigger some tests off of ansible/ansible commits. 14:12:32 an event causes zuul to query the change 14:13:07 #info migration proposal Step 1: Jobs would be defined in ansible/ansible repo 14:13:41 jtanner: short term, ansibot could evaluate time between something happening in a PR and zuul responding to it, and if the time is too long, it can issue a 'recheck' comment. 14:13:57 we do that now with shippable 14:14:02 it adds a needs_ci label 14:14:13 yup, zuul could react to that 14:14:28 but someone still has to go fire a new hook in shippable's case 14:14:32 that could also help us collect metrics on event delivery reliability 14:14:39 click rebuild in the UI OR close/reopen PR 14:14:42 ah 14:14:49 so the label being applied creates an event 14:14:53 zuul could react to the event to retrigger 14:14:57 bot will do it at some point, but requires my free time first 14:15:05 #info migration proposal: Step 1.5: ansible-container can use bonnieCI (zuul v3 running for ibm) instead of travis. 14:15:52 so one aspect of ansible-test + shippable is that the community can pretty much run an identical test path locally 14:16:00 #info migration proposal: Step 2: Operations: Who runs zuul instance? Where, zuul control? Where, zuul build resources? When: timeline for migration? 14:16:17 if we add/switch to zuul, will we be able to maintain that local testing path? 14:16:27 #info migration proposal: Step 2: existing repos for shared jobs 14:16:32 #undo 14:16:38 jtanner: as I understand it yes? Locally people aren't touching shippable, right? 14:16:41 #info migration proposal: Step 2: existing repos for shared jobs which can help us get started 14:16:45 they're just running the test script? 14:16:59 jimi|ansible: Shoot -- I'm not chaired. 14:17:02 depending on _how_ you design your jobs, you should be able to do that with zuul as well. 14:17:02 the test script(s) spin up the same containers + env that shippable uses 14:17:06 gah 14:17:07 jimi|ansible: so all my #infos haven't been recorded 14:17:08 #chair abadger1999 14:17:08 Current chairs: abadger1999 gundalow jimi|ansible jtanner mordred samdoran thaumos 14:17:22 jtanner: that model can be carried over to zuul 14:17:24 copy/paste? 14:17:26 k 14:17:30 #info design philosophy is that you should be able to configure so that only the system merges, not people. 14:17:37 #info migration proposal: Step 1: have openstack zuul trigger some tests off of ansible/ansible commits. 14:17:41 jtanner: Zuul can give you a VM that has docker in it, and your job is just executing the script 14:17:43 #info migration proposal Step 1: Jobs would be defined in ansible/ansible repo 14:17:49 #info migration proposal: Step 1.5: ansible-container can use bonnieCI (zuul v3 running for ibm) instead of travis. 14:17:50 (can't hear anything from the room) 14:17:57 it'd be nice if zodbot sent you a message when you try and do a # command and aren't chaired 14:17:58 (well, the person who asked a question) 14:18:00 #info migration proposal: Step 2: Operations: Who runs zuul instance? Where, zuul control? Where, zuul build resources? When: timeline for migration? 14:18:04 #info migration proposal: Step 2: existing repos for shared jobs which can help us get started 14:18:20 misc: sorry i missed the question too as i was typing here 14:18:33 misc: it was a clarification of the "who" bits 14:18:35 which monty restated 14:18:36 #info Who? Ansible ir RH Software Factory team or partner with IBM (use Bonnie CI)? 14:21:56 misc: Can you hear now? 14:22:47 gundalow: yup, that was people not speaking in the mic 14:22:52 I can hear monty fine 14:23:02 (and yanis I guess before ?) 14:23:10 misc: ah, OK 14:25:28 lost video :) 14:25:49 so did we 14:25:54 in room 14:26:26 and sound 14:26:33 network was dropped 14:26:51 bluejeans rate limiting =P 14:27:11 presenting laptop lost network (and/or bluejeans is having issues) 14:27:21 yeah, that happen 14:28:44 bummer. the dirty hacks part was the most interesting portion 14:29:02 we are paused while we try to resolve the av issues 14:29:10 back \o/ 14:29:22 magic 14:29:24 can you hear us now? 14:29:30 yes 14:29:31 yes 14:29:31 we hear 14:29:31 yes 14:29:32 I can hear you 14:29:46 we do not see 14:30:02 the screen sharing appears to be back on 14:30:20 yeah 14:30:35 . 14:30:50 Can you hear and see now? 14:31:05 #info dirty hacks (1): log streaming for command/shell tasks 14:31:11 no, but i just closed browser 14:31:35 hah 14:31:39 meeting end 14:31:39 time's up 14:31:43 what the hell... 14:31:45 because moderator crashed... 14:31:50 ha 14:31:51 so it stop after 5 minutes 14:32:05 finding Robyn 14:32:41 "How would you rate the overall quality of this meeting?" ... "meeting was affected" 14:32:50 rejoin 14:33:19 Can you hear and see now? 14:33:22 nope 14:33:29 I can see, but not hear 14:33:40 you are on mute 14:34:10 now I can hear 14:34:15 #info monty explaining that they have to fork a saemon process to stream command output 14:34:59 #info monty explaining tha the command module has been forked to stream data to zuul_console (the daemon process) 14:36:21 #info monty explaining that controller side, there's a zuul-stream callback plugin that intercepts stdout from the command and spawns streaming client thread, logs lines. 14:36:50 DO NOT CROSS THE STREAMS! 14:36:57 #info explaining that the logs can then be streamed to a clientvia a finger protocol. 14:37:00 MT 14:37:36 "forked the command module" ... still not as bad as "we flipped the module + connection plugin relationship" =P 14:37:51 * jtanner looks at networking folks 14:37:55 hahaha 14:38:19 jtanner: also flipped action plugin 14:38:30 it's flips all the way down 14:38:53 new module: realtime_command 14:39:21 run_command_live ... used to be a thing 14:40:03 p.communicate() is the first hurdle 14:40:24 p = Popen() 14:40:31 (so, se) = p.communicate() 14:42:03 http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/ansible/library/command.py?h=feature/zuulv3 14:42:11 is the source for the current forked module 14:42:30 searching for 'zuul' should find all the places where things have been changed 14:44:47 #info log streaming brainstorming: For streaming, implement update_json 14:45:09 #info log streaming brainstorming: For the foring of comman module, add a parameter to allow run_command to use a single pip for stdout and stderr 14:45:24 #undo 14:45:24 Removing item from minutes: INFO by abadger1999 at 14:45:09 : log streaming brainstorming: For the foring of comman module, add a parameter to allow run_command to use a single pip for stdout and stderr 14:45:32 #info log streaming brainstorming: For the forking of command module, add a parameter to allow run_command to use a single pip for stdout and stderr 14:46:09 #info log streaming brainstorming: Perhaps can implement streaming by modifying what's done with async instead of implementing update_json. 14:46:51 #info Ansilbe restricted environment 14:46:55 #undo 14:46:55 Removing item from minutes: INFO by abadger1999 at 14:46:51 : Ansilbe restricted environment 14:47:00 #info Ansible restricted environment 14:47:36 chroot -> proot -> bubblewrap 14:47:54 #info zuul uses "bubblewrap" which is a user-space lightweight container without needing to have root to create them. 14:47:54 https://github.com/projectatomic/bubblewrap 14:48:09 #link https://github.com/projectatomic/bubblewrap 14:48:55 #chair jeblair jlk bcoca 14:48:55 Current chairs: abadger1999 bcoca gundalow jeblair jimi|ansible jlk jtanner mordred samdoran thaumos 14:48:56 someone in the room needs to turn down their speakers 14:48:57 somebody is echoing things back into BJ 14:49:11 bcoca: we muted you 14:49:28 im muted on my side? 14:49:38 you weren't on the bluejeans side at that time 14:50:53 #info Look in https://git.openstack.org/cgit/openstack-infra/zuul/zuul/ansible for some of the hacks that are being used. 14:51:15 ^ that and the 'firewall strategy' ... 14:51:53 #undo 14:51:53 Removing item from minutes: INFO by abadger1999 at 14:50:53 : Look in https://git.openstack.org/cgit/openstack-infra/zuul/zuul/ansible for some of the hacks that are being used. 14:52:08 #info Look in http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/ansible?h=feature/zuulv3 for some of the hacks that are being used. 14:52:22 (other url was lacking the v3 branch selection) 14:52:45 boaty mcboatface waits for no one 15:01:54 do untrusted jobs allow envvvars/ansible.cfg? 15:05:01 #info PLEASE think of novel ways to break out of an Ansible environment, so that we can evaluate them against zuul protections. 15:05:58 * misc will do 15:06:01 get_url + shell + ansbile-playbook ... 15:07:02 I'm pretty sure we prevent you from downloading new content 15:07:12 in an untrusted run 15:07:36 right, it won't work in untrusted 15:08:00 unless you can come up with ways we aren't blocking for downloading content 15:08:01 but we need to write a test in zuul for it still 15:08:01 unarchive from tarball athat is part of job 15:08:09 jlk: how are you blocking it? 15:08:15 use dns tunneling 15:08:17 i think we permit downloading content 15:08:33 but you do not get out of the containment 15:08:34 bcoca: i'm assuming that all jobs use a non-privileged user, so anything run would be the same priv level as the playbook 15:08:43 but that's a good point, testing become escalation 15:08:46 provided we can already run arbitrary content 15:08:49 http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/ansible/action/unarchive.py?h=feature/zuulv3 15:08:52 for unarchive 15:08:59 bitcoin does not need privs to run! 15:09:15 everything is timed 15:09:21 yeah, but if the job are killed after some time 15:09:25 pabelanger: yeah, that's just checking the local path i believe 15:09:31 yup 15:09:52 does a bitcoin miner need a large block of time? could just test as many hashes as possible and exit 15:09:54 now, if you get auditing of what happen, people will abuse it, but you will know how 15:09:57 we're gonna need a bigger boat 15:10:12 jlk: not really looking at breaking yoru security, but looking at what would be nice to provide in core for 'secure setups' 15:10:24 would something like mirai need more of ressources ? 15:10:37 ^ i.e running only 'signed' plugins 15:10:40 nod 15:10:41 would a oneoff exploit that do pown a wordpress remotely be blocked ? 15:10:51 we have a list of things we allow or disallow running 15:10:53 (since that's basically "uri" ) 15:11:25 jlk: once 'signed' you can avoid havin cp ../url library/copy 15:11:28 i think we're proposing things that you could do on travis/shippable/et. al now 15:11:38 well 15:11:49 the difference is that on travis, all those things happen in the VM they give you 15:12:02 we're talking about the things that run ON our control plane, not in a VM we launch for you 15:12:09 well, these features would be for 'centralized/controlled' environments 15:12:18 (and on travis, that's SEP) 15:12:33 ansible-playbook runs on the control node, with the VM as the target 15:12:36 well they're in the bubblewrap environment right? and they're things that impact remotes 15:12:38 so we want in untrusted runs to prevent local execution of things 15:12:48 though honestly with the travis/shippable file you can exec commands just as easily 15:12:51 yeah they're in bubble wrap 15:13:49 so, if that run on amazon, it has access to the api ? 15:14:43 (kinda like https://media.ccc.de/v/33c3-7865-gone_in_60_milliseconds ) 15:15:37 so 15:15:39 secrets 15:16:04 we would design the pipelines so that the secret is not available in the "check" pipeline (the one that automatically runs on PR open) 15:16:32 (also related: https://hackernoon.com/capturing-all-the-flags-in-bsidessf-ctf-by-pwning-our-infrastructure-3570b99b4dd0 ) 15:16:32 and the secret is only available in a pipeline that requires human review before starting 15:16:39 so if a human reviews it and missed the fact that it uses the secret to eat all your $$, then sure. 15:16:51 anyway we're disconnecting to go drinking. 15:16:54 said human gets his pay docked? 15:17:05 jlk: that is always the correct reason 15:27:35 #endmeeting