19:00:10 <shertel> #startmeeting Ansible Core Public IRC Meeting
19:00:11 <zodbot> Meeting started Tue Nov 16 19:00:10 2021 UTC.
19:00:11 <zodbot> This meeting is logged and archived in a public location.
19:00:11 <zodbot> The chair is shertel. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions.
19:00:11 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
19:00:11 <zodbot> The meeting name has been set to 'ansible_core_public_irc_meeting'
19:00:29 * bcoca waves
19:00:30 <shertel> One item on the agenda today - type hinting in ansible-core
19:00:47 <shertel> #topic https://github.com/ansible/community/issues/635#issuecomment-966636254
19:00:47 <nitzmahone> o/
19:00:47 <bcoca> ansible turbo?
19:00:54 <shertel> Oh, and turbo
19:00:56 * nitzmahone hides
19:00:58 <shertel> heh
19:01:34 <shertel> we already have some type hinting in ansible-core, don't we? Is this just looking for a formal decision?
19:01:42 <bcoca> im ok with type hinting, i just think there are much higher priorities
19:02:18 <nitzmahone> Sort of- we just don't have a cohesive strategy around it- we need to make some technology decisions
19:02:52 <bcoca> ^ fine with defining 'howto', just not sure we should make it a project we need to do 'now'
19:03:10 <nitzmahone> ... and especially for existing APIs that weren't built with type hinting in mind (eg things that have highly polymorphic structures), we need to decide how pedantically correct vs "actionable type hints" we want to be
19:03:24 <bcoca> also, separate, it probably is easy to do for 'controller code' but not for target (module_utils/modules)
19:03:53 <nitzmahone> Yeah, I don't think anyone's advocating for a "let's type hint the entire codebase" project, but we also should figure out our plan before we start doing it piecemeal with differing philsophies
19:04:31 <bcoca> agreed, so #1 stop adding any type hinting for now #2 schedule discussion on 'which' and 'scope' of type hinting #3 remove ban
19:04:35 <shertel> okay, makes sense. I'm fine with type hinting too. "I think we can mitigate the cons by only putting type hints where they're useful. I think a good definition of "useful" in this case is where someone thought to add them or to request them." sounds reasonable
19:05:01 <nitzmahone> eg, I just asked Martin to add some type hints to a new API because it had some names that weren't necessarily clear on what was needed, so "this is supposed to be a string" was extremely helpful :D
19:05:08 <bcoca> im also fine with '#note: no type hinting hear cause .. reasons'
19:05:52 <nitzmahone> Yeah- especially for older / "organically grown" parts of the codebase, adding pedantically correct type hinting will result in things that are ... scary
19:05:55 <bcoca> nitzmahone: why im for it, just not piecemeal since there are competing standards, last thing we want is 3 diff typing systesm (we just got internal ones down to 2!!!)
19:06:25 <nitzmahone> Right, that's where we need to agree on the technology parts before we say eg, "all new things must have type hints"
19:07:02 <nitzmahone> There's also the sanity check aspect to it- mypy is kinda the standard, but there are implications we have to work through if choose that.
19:07:08 <bcoca> i see issues/adv with all the curren python typing systems, but prefer the latest pep as it is 'built in'
19:07:15 <nitzmahone> mattclay and I were just talking about this last night
19:07:18 <bcoca> otherwise i have no strong pref
19:08:05 <nitzmahone> eg, we'd have to pin each ansible-core major release to a specific version of mypy so the behavior is the same, and choose what deps/python stdlib we're going to validate against
19:08:31 * shertel nods
19:08:35 <bcoca> at the very least we should 'choose one' and not allow competing typing systems to be added
19:08:40 <nitzmahone> ... and doing that for external collections has a whole raft of problems, since there are multiple Ansible/Python/mypy versions in play
19:08:45 <bcoca> even if we dont want to enforce adding with types
19:09:04 <bcoca> nitzmahone: i would ignore that, just deal with 'core'
19:09:11 <bcoca> let collections sort themselves out
19:09:11 <nitzmahone> Well, it'll likely be two, unless we want to completely ignore non-controller type hints for now (which might be the answer)
19:09:28 <mattclay> Nearly all the code in ansible-test uses type hints -- but it was built from the ground up with type hints in mind. That made it significantly easier than adding them to existing code that wasn't written with type hinting in mind.
19:09:37 <bcoca> nitzmahone:  ^ mentioned that above, since 'non controller' has bigger swath of supported surface
19:10:06 <bcoca> mattclay: and that makes huge diff
19:10:15 <nitzmahone> ah sorry bcoca, I missed your list there
19:10:43 <nitzmahone> I'd vote for #2- we need to lay out the options and pick direction and scope
19:10:51 <bcoca> nitzmahone: also .. once implemented in core and tests .. i expect collections to lean that way, but i dont think we should choose based on them
19:10:58 <nitzmahone> agreed
19:11:05 <bcoca> nitzmahone: those were not options, but steps
19:11:08 <shertel> I'd also vote for #2
19:11:25 <shertel> there's a ban?
19:11:26 <mattclay> +1 on #2
19:11:30 <bcoca> #1 ban #2 discuss and decide #3 unban
19:11:37 <shertel> oh, that was 1
19:11:58 <bcoca> cause just dont want to add to problem before we decide solution
19:12:05 <nitzmahone> Yeah, I don't think we've got an explicit ban right now, but I know I'd be -1 to any effort to add type hinting on existing code until we sort out #2
19:12:06 <shertel> Yeah, that sounds reasonable. Wait to add more until we make a decision.
19:12:23 <nitzmahone> But we should also actively be working toward a decision
19:12:29 <bcoca> actually should be #1 ban #2 check exisging #3 discuss #4 decide #5 unban #6 possibly enforce
19:12:58 <nitzmahone> Yeah, I think the "how" of #6 there needs to be part of discussion/decision
19:13:11 <nitzmahone> If they're not being checked, they're just gonna rot
19:13:20 <bcoca> why i added 'possibly', depends on what we decide
19:13:35 <bcoca> well, there is 'enforce on every change' or 'enforce on new'
19:13:38 <nitzmahone> It's better than comments, because those rot too- at least this has the capability of being checked/enforced
19:13:52 <felixfontein> +1 on that
19:13:59 <bcoca> well, if we use proper docstrings .. we can enforce also
19:14:17 <bcoca> and if you make it a choice, i prefer docstrings
19:14:32 <bcoca> as there will be more than 'type' info
19:14:49 <mattclay> Thankfully docs and type hints aren't mutually exclusive. We can have both.
19:14:50 <nitzmahone> Realistically it probably needs to be both
19:14:52 <nitzmahone> yep
19:15:14 <bcoca> duplication bugs me ... but this i can live with
19:15:18 <nitzmahone> (and with proper type hints, many of the relevant parts of the docstrings can also be validated)
19:15:59 <nitzmahone> So what's our next step?
19:16:01 <shertel> the next irc meeting (that isn't a US holiday) is Nov 30th. Do we want to plan on laying out the choices and making a decision then?
19:16:06 <mattclay> There's not really duplication with type hints -- they just add something extra. Only docstrings end up repeating something (parameter names).
19:16:24 <nitzmahone> shertel +1 to that from me
19:16:57 <bcoca> nitzmahone: we currently have docstrings with type also
19:17:00 <bcoca> not just name
19:17:10 <bcoca> as i said 'depends on proper docstring'
19:17:22 <bcoca> we have many w/o type .. we have more w/o docstring at all
19:18:25 <nitzmahone> I'll take that to bring a list of options w/ some pros and cons to the 30th IRC meeting
19:18:53 <bcoca> works4me, probalby post on ticket a few days before meeting so everyone can read?
19:18:55 <shertel> nitzmahone: that would be awesome
19:19:07 <nitzmahone> bcoca: I was just typing exactly that :D
19:19:12 <bcoca> ;-p
19:19:35 <shertel> Okay. Next up :)
19:19:39 <shertel> #topic https://github.com/ansible/ansible/pull/76113
19:19:49 <shertel> Goneri?
19:21:47 <bcoca> i was -0 ... gone -1 on this now with the implications on credential disclosure/reuse/invalidation problems
19:22:44 <nitzmahone> A lot of the risks go down if it's a first-class core feature, but until core supports stateful workers, it can't be done "right", so I've been -1 for adopting anything like it until we can do it right
19:23:12 <Goneri> Hi!
19:23:24 <bcoca> nitzmahone: any stateful worker would still have to avoid credential caching .. but this subsystem specifically relies on it
19:24:21 <Goneri> So, we've been maturing this turbo mode thing for a while now. And the performance benefit are pretty positive in our context.
19:24:59 <bcoca> not what worries me, mitogen for example has clear performance gain .. but feature loss and security implications
19:25:17 <bcoca> and there are many parallels with this code
19:25:30 <nitzmahone> (and with previous iterations of `accelerate`, etc)
19:25:37 <Goneri> The fact the feature is not integrated in core yet prevent us from going forward with a better integration. We've this "we're a collection" limit.
19:25:39 <bcoca> fireball was better name ....
19:26:26 <Goneri> Well, the socket  system is pretty similar to what we've got with SSH. And this is the kind of thing we can hardly improve without some modification in core.
19:27:27 <Goneri> In our ideal world, ansible would spawn a python process. and run the different tasks within the same process.
19:27:41 <bcoca> which is what mitogen does
19:27:53 <nitzmahone> Most of the barriers to core being able to properly adopt this feature are still in place- IIUC it solves some of the "cold start" problems with certain things, but making something that's safely generally applicable is still not feasible without intra-task state in core.
19:28:27 <bcoca> and currently most tasks do NOT want state permancence, sometimes even within loop iterations of same task
19:28:32 <nitzmahone> Core will have intra-task state at some point, but it's a major rearchitecture
19:28:35 <bcoca> i.e delegate_to: '{{item}}
19:28:50 <Goneri> Yes, you need to keep a session object alieve on the remove side, or you lose the benefit of having a remote process.
19:29:02 <nitzmahone> hacking something together without it will lead to another instance of eg `ansible-connection`, which is ... not fun
19:29:03 <bcoca> nitzmahone my thought on that was to add a 'persistent: false|auth|full' keyword
19:29:39 <bcoca> with connection plugin disclosing it's support and a-connection suplementing when connection plugin does not supply
19:29:46 <Goneri> But, in our case, this is the collection authors responsibility. They've got to handle the case themselve.
19:30:14 <bcoca> Goneri: understood, but that is 'good for us' from core perspective, we don't need to deal with many of the implications
19:30:16 <nitzmahone> Core wasn't in a position to support it a year ago, and being half the size, we're certainly not now
19:30:16 <shertel> I don't really understand the credential issues
19:31:01 <nitzmahone> Until the architectural issues can be solved, we need to limit the "blast radius" of something like this to the modules that *REALLY* need it, which IIUC is what's been happening thus far.
19:31:03 <Goneri> ahah, I dislike the turbo mode name so much.
19:31:08 <bcoca> shertel:  its not a big issue when running a simple play by hand, but when running jobs with multiple plays by diff authors it can get 'fun'
19:31:50 <bcoca> Goneri: better than 'temporary agent mode with caching'
19:32:56 <Goneri> Is there a way to track the stateful worker effort?
19:33:11 <bcoca> only if you are a telepath in nitzmahone and my heads
19:33:35 <bcoca> its something we've discussed for a few years but never had time/push to create a project to bring to fruition
19:33:41 <bcoca> why i would not count on it
19:34:18 <bcoca> also requires major reformat of core engine, how action/connection/shell/become/terminal plugins work
19:34:45 <bcoca> but first thing i would move would be 'async'
19:34:50 <nitzmahone> It's still not a scheduled project- I've prototyped a number of the necessary bits, and I've theoretically got management support for it (couched as a memory/performance project)
19:36:20 <nitzmahone> But doing it right will likely require a staged rearchitecture of the entire core worker model over several releases
19:36:21 <shertel> Okay, so it seems like there's consensus (sivel, bcoca, nitz) not to include turbo in ansible-core, at least yet
19:36:54 <Goneri> I totally understand your position. However, I would also be happy to help to implement a better solution within core itself.
19:37:22 <nitzmahone> Unless you're rejoining the core team full time... ;)
19:37:43 <bcoca> ^ and we would probably have you working on 20 other things first
19:37:52 <Goneri> bcoca: execution environment should reduce the risk here. I imagine it's possible to disable some features when the context is not "safe enough".
19:38:18 <nitzmahone> That's the controller side- the risk is on the target side
19:38:21 <Goneri> can you elaborate. I'm not sure I understand.
19:38:23 <bcoca> EE doesn't matter as much as 'expected context' of differing play authors
19:38:56 <bcoca> Goneri:  play1 and play2 get written to handle vms by authors, #1 adds explicit credentials, #2 assumes defaults from env
19:39:03 <bcoca> when run independantly all is good
19:39:11 <bcoca> but ansible-playbook play1.yml play2.yml  .....
19:39:53 <Goneri> I'm not sure I want to do that.
19:40:14 <bcoca> add to that that #1 uses 'admin' credentials and #2 does some stuff it shouldn't but they know 'fails' since they dont have perms .. but now suddenly succeeds!
19:40:24 <bcoca> i.e test  infra exists by trying to delete
19:41:00 <bcoca> Goneri: think big corp with 10 depts, one asks other 2 'do this' gets the plays, tests them independantly, then adds them all to 'big job'
19:41:33 <bcoca> currently ansible lends itself well to this by isolating task contexts
19:41:34 <nitzmahone> Goneri: one question- how important is inter-run persistence vs inter-task persistence? The latter will be much easier to accommodate under the imagined core stateful worker stuff in a safe and robust fashion than the latter
19:42:03 <bcoca> nitzmahone: ^ issue above is just 'inter task' ... inter run ... multiplies it
19:42:39 <nitzmahone> Right, but a lot of the inter-task risks can be mitigated with stateful workers, the inter-run stuff is *much* harder.
19:44:35 * nitzmahone just re-read my two messages ago, s/ than the latter/than the former/ ;)
19:45:25 <Goneri> yup :-)
19:46:07 <bcoca> nitzmahone: the problem stems from stateful workers ....not sure how they also solve it
19:46:34 <nitzmahone> They don't, but they at least make it more reasonable to solve it
19:46:50 <bcoca> again, w/o statefulness .. no problem
19:46:54 <nitzmahone> Where working stateless is an absolute security minefield
19:47:03 <shertel> Goneri, did you see the question? "how important is inter-run persistence vs inter-task persistence? "
19:47:09 <Goneri> ee allow to isolate the socket when we're on localhost. Regarding the cred sharing, we already send the credentials to the remote host all the time, it's not something new.
19:47:36 <bcoca> no, stateless means each action deals with it's own security/credentials, no caching/ no state, no big problem .. how we work today
19:48:01 <nitzmahone> I meant stateless controller->stateful persistent target
19:48:12 <nitzmahone> (is the security minefield)
19:48:18 <bcoca> Goneri:  EE does not matter, since the issue is relation between controller and target (controller being your laptop, tower machien or EE does not matter in that context)
19:48:27 <Goneri> This is the collection maintainer responsibility. We use a hash of the different cred key/val to identify the session in the cache.
19:48:42 <bcoca> nitzmahone: ah, you want to tie worker to remote keepign sync state?
19:48:48 <nitzmahone> yes
19:49:08 <bcoca> Goneri: little to do with collection, more of an issue to do with plays
19:49:26 <nitzmahone> Not keeping the *actual* state, but allowing session isolation and persistence to be managed as a first-class property
19:49:50 <bcoca> well, the state is in the remote, the local worker just keeps the connection to it
19:50:22 <bcoca> instead of disassociating like now and reusing external conduit (socket/a-connection) to reesstablish with persistent remote
19:50:31 <Goneri> We use the password to compute our hash, but yes I understand your point.
19:50:32 <bcoca> but that also opens other impliciations about performance and resource usage
19:52:04 <Goneri> In general, we've got a playbook with a series of tasks to run. This is what we want to speed up. We don't real care about isolated task and it's acceptable to reopen a new session for those.
19:53:00 <nitzmahone> cool, that makes things easier when the time comes :D
19:53:00 <bcoca> i understand that, but i still have to look into it being used in wider contexts, specially when some parts are completely unware of the others
19:54:27 <bcoca> arent yo just using tempfile to create the socket path?
19:55:39 <bcoca> and then you just import the module into the server to cal it, this creates many issues on both the play and module author side
19:55:44 <Goneri> We did some refactoring of our modules to be compatible with the turbo mode. Why I'm mentioning this. When you speak about refactoring of core, it sounds like you want to cover all the existing modules. Actually, it's just some few collections. For instance, we can totally have something saying: If you want to speed up your collection, you need to provide a module with this extra API.
19:56:47 <bcoca> Goneri:  our refactoring would cover more than yours, yes, thinking most of cloud/api/networking, but that is independant of the issues with a semi persistent agent that caches credentials
19:57:16 <bcoca> from security perspective its 'dragon country'
19:57:30 <Goneri> shertel:  I tried to answer. But I'm not sure I understand it.
19:57:52 <shertel> Goneri: I think you answered :)
19:58:46 <shertel> I'm not sure I really understand either, but the multi-playbook example was helpful
19:58:48 <Goneri> bcoca:  If you've got a cache to share, it will be only useful for the modules of the same collection (e.g: Kubernetes or VMware).
19:58:51 <bcoca> well, intra run is much bigger can of worms, inter run is already tricky^ all of my issues above were just dealing with intra
19:59:08 <bcoca> Goneri: or diff collections that overlap, ansible.windows community.windows
20:00:50 <Goneri> If parts are unware of the others, it means they are from another collection. And there is no session sharing.
20:02:13 <bcoca> those parts are plays, for collections you actually want the oposite
20:02:49 <Goneri> Right now, the modules declare the collection namespace to use, so two collections can share the same namespace.
20:02:53 <bcoca> Goneri: all the conflicts i brought up before were with 'same collection'
20:03:30 <bcoca> adding multiple collections just opens the surface area on programmer issues, which i've barely started on, most of mine were 'play author issues'
20:04:45 <Goneri> Ideally, if Ansible can open one remote process per execution context, this problem would be resolved.
20:04:55 <bcoca> but your  turby_fail module is a good example of easy disclosure, swhich module.fail_json for custom send_json that avoids our heuristics (which are also thinking of remove) ... now you have disclosre issue
20:05:08 <bcoca> Goneri: define 'execution context'
20:05:11 <Goneri> But I think this is what nitzmahone actually stated above.
20:05:18 <bcoca> cause right now its per host/task/loop iterm
20:06:21 <bcoca> let me try with an example,
20:07:16 <bcoca> host1(devel) host2(staging) host3(production), but live in same cloud, only thing diff is credentials you define at host level
20:07:23 <Goneri> I understand what you mean, I was just thinking about it :-).
20:07:30 <bcoca> ok ..
20:08:13 <shertel> We are at time. Goneri, do you have any immediate unresolved questions before we wrap up?
20:09:29 <Goneri> We need something like the PID of ansible-playbook and a session ID. The session management depends on what the remote API works (e.g: login, host, pw).
20:09:42 <Goneri> Wrap up please, or this continue for hours :-)
20:09:50 <shertel> hah :)
20:09:51 <bcoca> Goneri: even if code is not included in core, i advise to have ALWAYS a timeout, loop.forever() is not good idea
20:09:54 <Goneri> * or this will continue for
20:10:52 <bcoca> flamewar.run_forever()
20:10:54 <Goneri> bcoca: there is a timeout (10s AFAIR), it's done with an asyncio routine.
20:11:24 <bcoca> i might have missed, looking
20:12:04 <bcoca> i saw a signal, but not a automatic timeout
20:13:49 <bcoca> you do setup a ttl, but cannot find wher eit is used
20:14:41 <bcoca> also defaults to None
20:15:03 <bcoca> but i have only skimmed code
20:15:16 <shertel> +1 Thanks for looking
20:15:36 <bcoca> i see you use it on connect but not for daemon/loop
20:16:16 <shertel> I'm going to end now, though discussion about the current impl can continue. Not sure if it's relevant to the meeting logs.
20:16:27 <shertel> thanks all for attending!
20:16:34 <shertel> #endmeeting