19:00:10 #startmeeting Ansible Core Public IRC Meeting 19:00:11 Meeting started Tue Nov 16 19:00:10 2021 UTC. 19:00:11 This meeting is logged and archived in a public location. 19:00:11 The chair is shertel. Information about MeetBot at https://fedoraproject.org/wiki/Zodbot#Meeting_Functions. 19:00:11 Useful Commands: #action #agreed #halp #info #idea #link #topic. 19:00:11 The meeting name has been set to 'ansible_core_public_irc_meeting' 19:00:29 * bcoca waves 19:00:30 One item on the agenda today - type hinting in ansible-core 19:00:47 #topic https://github.com/ansible/community/issues/635#issuecomment-966636254 19:00:47 o/ 19:00:47 ansible turbo? 19:00:54 Oh, and turbo 19:00:56 * nitzmahone hides 19:00:58 heh 19:01:34 we already have some type hinting in ansible-core, don't we? Is this just looking for a formal decision? 19:01:42 im ok with type hinting, i just think there are much higher priorities 19:02:18 Sort of- we just don't have a cohesive strategy around it- we need to make some technology decisions 19:02:52 ^ fine with defining 'howto', just not sure we should make it a project we need to do 'now' 19:03:10 ... and especially for existing APIs that weren't built with type hinting in mind (eg things that have highly polymorphic structures), we need to decide how pedantically correct vs "actionable type hints" we want to be 19:03:24 also, separate, it probably is easy to do for 'controller code' but not for target (module_utils/modules) 19:03:53 Yeah, I don't think anyone's advocating for a "let's type hint the entire codebase" project, but we also should figure out our plan before we start doing it piecemeal with differing philsophies 19:04:31 agreed, so #1 stop adding any type hinting for now #2 schedule discussion on 'which' and 'scope' of type hinting #3 remove ban 19:04:35 okay, makes sense. I'm fine with type hinting too. "I think we can mitigate the cons by only putting type hints where they're useful. I think a good definition of "useful" in this case is where someone thought to add them or to request them." sounds reasonable 19:05:01 eg, I just asked Martin to add some type hints to a new API because it had some names that weren't necessarily clear on what was needed, so "this is supposed to be a string" was extremely helpful :D 19:05:08 im also fine with '#note: no type hinting hear cause .. reasons' 19:05:52 Yeah- especially for older / "organically grown" parts of the codebase, adding pedantically correct type hinting will result in things that are ... scary 19:05:55 nitzmahone: why im for it, just not piecemeal since there are competing standards, last thing we want is 3 diff typing systesm (we just got internal ones down to 2!!!) 19:06:25 Right, that's where we need to agree on the technology parts before we say eg, "all new things must have type hints" 19:07:02 There's also the sanity check aspect to it- mypy is kinda the standard, but there are implications we have to work through if choose that. 19:07:08 i see issues/adv with all the curren python typing systems, but prefer the latest pep as it is 'built in' 19:07:15 mattclay and I were just talking about this last night 19:07:18 otherwise i have no strong pref 19:08:05 eg, we'd have to pin each ansible-core major release to a specific version of mypy so the behavior is the same, and choose what deps/python stdlib we're going to validate against 19:08:31 * shertel nods 19:08:35 at the very least we should 'choose one' and not allow competing typing systems to be added 19:08:40 ... and doing that for external collections has a whole raft of problems, since there are multiple Ansible/Python/mypy versions in play 19:08:45 even if we dont want to enforce adding with types 19:09:04 nitzmahone: i would ignore that, just deal with 'core' 19:09:11 let collections sort themselves out 19:09:11 Well, it'll likely be two, unless we want to completely ignore non-controller type hints for now (which might be the answer) 19:09:28 Nearly all the code in ansible-test uses type hints -- but it was built from the ground up with type hints in mind. That made it significantly easier than adding them to existing code that wasn't written with type hinting in mind. 19:09:37 nitzmahone: ^ mentioned that above, since 'non controller' has bigger swath of supported surface 19:10:06 mattclay: and that makes huge diff 19:10:15 ah sorry bcoca, I missed your list there 19:10:43 I'd vote for #2- we need to lay out the options and pick direction and scope 19:10:51 nitzmahone: also .. once implemented in core and tests .. i expect collections to lean that way, but i dont think we should choose based on them 19:10:58 agreed 19:11:05 nitzmahone: those were not options, but steps 19:11:08 I'd also vote for #2 19:11:25 there's a ban? 19:11:26 +1 on #2 19:11:30 #1 ban #2 discuss and decide #3 unban 19:11:37 oh, that was 1 19:11:58 cause just dont want to add to problem before we decide solution 19:12:05 Yeah, I don't think we've got an explicit ban right now, but I know I'd be -1 to any effort to add type hinting on existing code until we sort out #2 19:12:06 Yeah, that sounds reasonable. Wait to add more until we make a decision. 19:12:23 But we should also actively be working toward a decision 19:12:29 actually should be #1 ban #2 check exisging #3 discuss #4 decide #5 unban #6 possibly enforce 19:12:58 Yeah, I think the "how" of #6 there needs to be part of discussion/decision 19:13:11 If they're not being checked, they're just gonna rot 19:13:20 why i added 'possibly', depends on what we decide 19:13:35 well, there is 'enforce on every change' or 'enforce on new' 19:13:38 It's better than comments, because those rot too- at least this has the capability of being checked/enforced 19:13:52 +1 on that 19:13:59 well, if we use proper docstrings .. we can enforce also 19:14:17 and if you make it a choice, i prefer docstrings 19:14:32 as there will be more than 'type' info 19:14:49 Thankfully docs and type hints aren't mutually exclusive. We can have both. 19:14:50 Realistically it probably needs to be both 19:14:52 yep 19:15:14 duplication bugs me ... but this i can live with 19:15:18 (and with proper type hints, many of the relevant parts of the docstrings can also be validated) 19:15:59 So what's our next step? 19:16:01 the next irc meeting (that isn't a US holiday) is Nov 30th. Do we want to plan on laying out the choices and making a decision then? 19:16:06 There's not really duplication with type hints -- they just add something extra. Only docstrings end up repeating something (parameter names). 19:16:24 shertel +1 to that from me 19:16:57 nitzmahone: we currently have docstrings with type also 19:17:00 not just name 19:17:10 as i said 'depends on proper docstring' 19:17:22 we have many w/o type .. we have more w/o docstring at all 19:18:25 I'll take that to bring a list of options w/ some pros and cons to the 30th IRC meeting 19:18:53 works4me, probalby post on ticket a few days before meeting so everyone can read? 19:18:55 nitzmahone: that would be awesome 19:19:07 bcoca: I was just typing exactly that :D 19:19:12 ;-p 19:19:35 Okay. Next up :) 19:19:39 #topic https://github.com/ansible/ansible/pull/76113 19:19:49 Goneri? 19:21:47 i was -0 ... gone -1 on this now with the implications on credential disclosure/reuse/invalidation problems 19:22:44 A lot of the risks go down if it's a first-class core feature, but until core supports stateful workers, it can't be done "right", so I've been -1 for adopting anything like it until we can do it right 19:23:12 Hi! 19:23:24 nitzmahone: any stateful worker would still have to avoid credential caching .. but this subsystem specifically relies on it 19:24:21 So, we've been maturing this turbo mode thing for a while now. And the performance benefit are pretty positive in our context. 19:24:59 not what worries me, mitogen for example has clear performance gain .. but feature loss and security implications 19:25:17 and there are many parallels with this code 19:25:30 (and with previous iterations of `accelerate`, etc) 19:25:37 The fact the feature is not integrated in core yet prevent us from going forward with a better integration. We've this "we're a collection" limit. 19:25:39 fireball was better name .... 19:26:26 Well, the socket system is pretty similar to what we've got with SSH. And this is the kind of thing we can hardly improve without some modification in core. 19:27:27 In our ideal world, ansible would spawn a python process. and run the different tasks within the same process. 19:27:41 which is what mitogen does 19:27:53 Most of the barriers to core being able to properly adopt this feature are still in place- IIUC it solves some of the "cold start" problems with certain things, but making something that's safely generally applicable is still not feasible without intra-task state in core. 19:28:27 and currently most tasks do NOT want state permancence, sometimes even within loop iterations of same task 19:28:32 Core will have intra-task state at some point, but it's a major rearchitecture 19:28:35 i.e delegate_to: '{{item}} 19:28:50 Yes, you need to keep a session object alieve on the remove side, or you lose the benefit of having a remote process. 19:29:02 hacking something together without it will lead to another instance of eg `ansible-connection`, which is ... not fun 19:29:03 nitzmahone my thought on that was to add a 'persistent: false|auth|full' keyword 19:29:39 with connection plugin disclosing it's support and a-connection suplementing when connection plugin does not supply 19:29:46 But, in our case, this is the collection authors responsibility. They've got to handle the case themselve. 19:30:14 Goneri: understood, but that is 'good for us' from core perspective, we don't need to deal with many of the implications 19:30:16 Core wasn't in a position to support it a year ago, and being half the size, we're certainly not now 19:30:16 I don't really understand the credential issues 19:31:01 Until the architectural issues can be solved, we need to limit the "blast radius" of something like this to the modules that *REALLY* need it, which IIUC is what's been happening thus far. 19:31:03 ahah, I dislike the turbo mode name so much. 19:31:08 shertel: its not a big issue when running a simple play by hand, but when running jobs with multiple plays by diff authors it can get 'fun' 19:31:50 Goneri: better than 'temporary agent mode with caching' 19:32:56 Is there a way to track the stateful worker effort? 19:33:11 only if you are a telepath in nitzmahone and my heads 19:33:35 its something we've discussed for a few years but never had time/push to create a project to bring to fruition 19:33:41 why i would not count on it 19:34:18 also requires major reformat of core engine, how action/connection/shell/become/terminal plugins work 19:34:45 but first thing i would move would be 'async' 19:34:50 It's still not a scheduled project- I've prototyped a number of the necessary bits, and I've theoretically got management support for it (couched as a memory/performance project) 19:36:20 But doing it right will likely require a staged rearchitecture of the entire core worker model over several releases 19:36:21 Okay, so it seems like there's consensus (sivel, bcoca, nitz) not to include turbo in ansible-core, at least yet 19:36:54 I totally understand your position. However, I would also be happy to help to implement a better solution within core itself. 19:37:22 Unless you're rejoining the core team full time... ;) 19:37:43 ^ and we would probably have you working on 20 other things first 19:37:52 bcoca: execution environment should reduce the risk here. I imagine it's possible to disable some features when the context is not "safe enough". 19:38:18 That's the controller side- the risk is on the target side 19:38:21 can you elaborate. I'm not sure I understand. 19:38:23 EE doesn't matter as much as 'expected context' of differing play authors 19:38:56 Goneri: play1 and play2 get written to handle vms by authors, #1 adds explicit credentials, #2 assumes defaults from env 19:39:03 when run independantly all is good 19:39:11 but ansible-playbook play1.yml play2.yml ..... 19:39:53 I'm not sure I want to do that. 19:40:14 add to that that #1 uses 'admin' credentials and #2 does some stuff it shouldn't but they know 'fails' since they dont have perms .. but now suddenly succeeds! 19:40:24 i.e test infra exists by trying to delete 19:41:00 Goneri: think big corp with 10 depts, one asks other 2 'do this' gets the plays, tests them independantly, then adds them all to 'big job' 19:41:33 currently ansible lends itself well to this by isolating task contexts 19:41:34 Goneri: one question- how important is inter-run persistence vs inter-task persistence? The latter will be much easier to accommodate under the imagined core stateful worker stuff in a safe and robust fashion than the latter 19:42:03 nitzmahone: ^ issue above is just 'inter task' ... inter run ... multiplies it 19:42:39 Right, but a lot of the inter-task risks can be mitigated with stateful workers, the inter-run stuff is *much* harder. 19:44:35 * nitzmahone just re-read my two messages ago, s/ than the latter/than the former/ ;) 19:45:25 yup :-) 19:46:07 nitzmahone: the problem stems from stateful workers ....not sure how they also solve it 19:46:34 They don't, but they at least make it more reasonable to solve it 19:46:50 again, w/o statefulness .. no problem 19:46:54 Where working stateless is an absolute security minefield 19:47:03 Goneri, did you see the question? "how important is inter-run persistence vs inter-task persistence? " 19:47:09 ee allow to isolate the socket when we're on localhost. Regarding the cred sharing, we already send the credentials to the remote host all the time, it's not something new. 19:47:36 no, stateless means each action deals with it's own security/credentials, no caching/ no state, no big problem .. how we work today 19:48:01 I meant stateless controller->stateful persistent target 19:48:12 (is the security minefield) 19:48:18 Goneri: EE does not matter, since the issue is relation between controller and target (controller being your laptop, tower machien or EE does not matter in that context) 19:48:27 This is the collection maintainer responsibility. We use a hash of the different cred key/val to identify the session in the cache. 19:48:42 nitzmahone: ah, you want to tie worker to remote keepign sync state? 19:48:48 yes 19:49:08 Goneri: little to do with collection, more of an issue to do with plays 19:49:26 Not keeping the *actual* state, but allowing session isolation and persistence to be managed as a first-class property 19:49:50 well, the state is in the remote, the local worker just keeps the connection to it 19:50:22 instead of disassociating like now and reusing external conduit (socket/a-connection) to reesstablish with persistent remote 19:50:31 We use the password to compute our hash, but yes I understand your point. 19:50:32 but that also opens other impliciations about performance and resource usage 19:52:04 In general, we've got a playbook with a series of tasks to run. This is what we want to speed up. We don't real care about isolated task and it's acceptable to reopen a new session for those. 19:53:00 cool, that makes things easier when the time comes :D 19:53:00 i understand that, but i still have to look into it being used in wider contexts, specially when some parts are completely unware of the others 19:54:27 arent yo just using tempfile to create the socket path? 19:55:39 and then you just import the module into the server to cal it, this creates many issues on both the play and module author side 19:55:44 We did some refactoring of our modules to be compatible with the turbo mode. Why I'm mentioning this. When you speak about refactoring of core, it sounds like you want to cover all the existing modules. Actually, it's just some few collections. For instance, we can totally have something saying: If you want to speed up your collection, you need to provide a module with this extra API. 19:56:47 Goneri: our refactoring would cover more than yours, yes, thinking most of cloud/api/networking, but that is independant of the issues with a semi persistent agent that caches credentials 19:57:16 from security perspective its 'dragon country' 19:57:30 shertel: I tried to answer. But I'm not sure I understand it. 19:57:52 Goneri: I think you answered :) 19:58:46 I'm not sure I really understand either, but the multi-playbook example was helpful 19:58:48 bcoca: If you've got a cache to share, it will be only useful for the modules of the same collection (e.g: Kubernetes or VMware). 19:58:51 well, intra run is much bigger can of worms, inter run is already tricky^ all of my issues above were just dealing with intra 19:59:08 Goneri: or diff collections that overlap, ansible.windows community.windows 20:00:50 If parts are unware of the others, it means they are from another collection. And there is no session sharing. 20:02:13 those parts are plays, for collections you actually want the oposite 20:02:49 Right now, the modules declare the collection namespace to use, so two collections can share the same namespace. 20:02:53 Goneri: all the conflicts i brought up before were with 'same collection' 20:03:30 adding multiple collections just opens the surface area on programmer issues, which i've barely started on, most of mine were 'play author issues' 20:04:45 Ideally, if Ansible can open one remote process per execution context, this problem would be resolved. 20:04:55 but your turby_fail module is a good example of easy disclosure, swhich module.fail_json for custom send_json that avoids our heuristics (which are also thinking of remove) ... now you have disclosre issue 20:05:08 Goneri: define 'execution context' 20:05:11 But I think this is what nitzmahone actually stated above. 20:05:18 cause right now its per host/task/loop iterm 20:06:21 let me try with an example, 20:07:16 host1(devel) host2(staging) host3(production), but live in same cloud, only thing diff is credentials you define at host level 20:07:23 I understand what you mean, I was just thinking about it :-). 20:07:30 ok .. 20:08:13 We are at time. Goneri, do you have any immediate unresolved questions before we wrap up? 20:09:29 We need something like the PID of ansible-playbook and a session ID. The session management depends on what the remote API works (e.g: login, host, pw). 20:09:42 Wrap up please, or this continue for hours :-) 20:09:50 hah :) 20:09:51 Goneri: even if code is not included in core, i advise to have ALWAYS a timeout, loop.forever() is not good idea 20:09:54 * or this will continue for 20:10:52 flamewar.run_forever() 20:10:54 bcoca: there is a timeout (10s AFAIR), it's done with an asyncio routine. 20:11:24 i might have missed, looking 20:12:04 i saw a signal, but not a automatic timeout 20:13:49 you do setup a ttl, but cannot find wher eit is used 20:14:41 also defaults to None 20:15:03 but i have only skimmed code 20:15:16 +1 Thanks for looking 20:15:36 i see you use it on connect but not for daemon/loop 20:16:16 I'm going to end now, though discussion about the current impl can continue. Not sure if it's relevant to the meeting logs. 20:16:27 thanks all for attending! 20:16:34 #endmeeting