15:02:14 #startmeeting Public Core Meeting 15:02:14 Meeting started Thu Oct 20 15:02:14 2016 UTC. The chair is jimi|ansible. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:14 Useful Commands: #action #agreed #halp #info #idea #link #topic. 15:02:14 The meeting name has been set to 'public_core_meeting' 15:02:25 * gundalow waves 15:03:19 gm 15:03:29 and afternoon 15:03:31 #chair gundalow abadger1999 bcoca jtanner jtanner|t420 mattclay ryansb thaumos 15:03:31 Current chairs: abadger1999 bcoca gundalow jimi|ansible jtanner jtanner|t420 mattclay ryansb thaumos 15:03:39 hello 15:03:40 \o/ 15:03:45 bloop 15:03:47 #topic Persistent Connnections 15:03:54 blar 15:03:58 #chair alikins_also 15:03:58 Current chairs: abadger1999 alikins_also bcoca gundalow jimi|ansible jtanner jtanner|t420 mattclay ryansb thaumos 15:03:59 #topic Changing Agenda format 15:04:12 o_O 15:04:25 I'd like to propose that we have standing agendas, rather than one per meeting* 15:04:31 *when we remember to create them 15:04:34 gundalow: is that what you put in slack this morning? 15:04:43 *or multiple that when we forget to close them 15:04:46 jtanner: correct 15:04:51 In GH issues still? 15:04:56 yup 15:04:59 i'm fine with that 15:05:13 makes it easier ... until we get 1000 comments deep 15:05:22 create a .rst for it and modify it via PRs? 15:05:35 For example, if you look at https://github.com/ansible/community/issues/110 I just ~~strikethrough~~ and add a comment, such as "merged", or discussed and we agreed to do X 15:05:38 no more PRs, we can barely merge the ones we have 15:05:39 ola 15:05:56 use the github wiki then 15:05:59 Standing issue is prob fine so long as we also remember to update/cross off items, and probably need to cycle it when it gets too big. 15:06:11 the other option is using the 'projects' category for each meeting 15:06:17 GitHub issues mean anyone can add stuff. and people with powers can edit existing comments to strike through 15:06:37 bcoca: that might work, though non-contribs can't edit or comment on them can they? 15:06:49 I like GitHub issues as it's easy to link from PRs/Issues/Commits - and you see the link on both ends 15:06:50 anyone can comment on a ticket 15:06:59 There's no comment threading 15:07:07 was mostly thinking project == meeting type 15:07:29 then issues in project can be 'single issue' as gundalow is proposing or 'issue per actual meeting' which is what we had till now 15:07:41 The issue I think needs solving is that we aren't creating/closing tickets 15:07:59 *cough* bot job *cough* 15:08:03 agreed, i don' t think either approach fixes that, its a discipline issue 15:08:07 * jtanner hides. 15:08:24 Testing Working Group; Network; New Module are work with a standing agenda ticket 15:08:28 https://github.com/ansible/community/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20label%3Ameeting_agenda%20 15:08:43 It's only the Core Meeting that's different 15:08:46 i propose taht whomever manages the meeting needs to update the ticket/tickets 15:08:59 thought that was already the case 15:09:01 That in itself has been tricky 15:09:12 jtanner: probably implied but i was not aware of it 15:09:26 * gundalow would like to do better things with his life than copy & pasting ~100 tickets a year for meetigns 15:10:00 Can we try having a standing agenda for a month and see how we get on? 15:10:06 gundalow: either method of organizing does not change the volume 15:10:17 gundalow: +1 seems reasonable to try. 15:10:25 wfm 15:10:41 gundalow: the real test will be the first time we have to cycle to a new ticket. 15:10:42 ^ im not saying im against it, almost any way works for me 15:10:51 abadger1999: yup 15:10:57 open mont/year titled? 15:11:02 sept_2016 15:11:10 #action #gundalow to create meeting ticket 15:11:16 ^ that is probably a good medium with the approaches 15:11:20 or use milestone? 15:11:22 2016_10 15:11:25 we need to hire a person to do all this busy work 15:11:33 #topic open floor 15:11:44 NEXT 15:11:48 #topic Persistent Connnections 15:11:59 ^ isn't that the NEXT meeting? 15:12:00 privateip said he couldn't make this meeting 15:12:09 but we -were- supposed to talk about it here 15:12:10 Peter *can* make the BJ meeting 15:12:10 * bcoca x2 checks calendar 15:12:24 we still said he wasn't absolutely required to attend, as this is an internal design thing 15:12:24 he's just messaged me 15:12:59 he is on vacation so he gets to slack off 15:13:22 Peter has just messaged me to say he will be running the BJ meeting we have scheduled in ~45 minutes 15:13:30 s/get to/should be/ 15:13:33 i think we all agree, this is a feature we need for simple scaling purposes, just to get that confirmed 15:13:35 :-) 15:13:51 bcoca: +1 15:13:52 gundalow: i understand, however we can hash things out now or at least discuss options 15:14:05 OK :) 15:14:14 controlpersist master helps but it is not w/o issues, the connection persistance is something that is 'hidden' from ansible 15:14:14 i had no idea that the network modules had their own special connection method 15:14:15 start, and I'll record stuff in gDocs 15:14:22 Qalthos: you around? 15:14:27 yup 15:14:31 jtanner: so do the cloud modules, since they basically do api calls 15:14:34 #chair Qalthos 15:14:34 Current chairs: Qalthos abadger1999 alikins_also bcoca gundalow jimi|ansible jtanner jtanner|t420 mattclay ryansb thaumos 15:14:42 so, for those who are not on the core team, it has come up that we need some method of allowing persistent connections outside of ControlPersist for non-ssh connections 15:14:49 bcoca: but most people run those in "local" connection mode 15:15:07 Qalthos: cool, may need you to chip in with the technical detail about how networking currently works and the proposals 15:15:12 or for ssh connections that cannot use control persist (most network devices) 15:15:13 this is especially useful for winrm connections, however it is also useful for other connection types like paramiko 15:15:27 bcoca: I am so unclear about that... CP is on the master side, not the device 15:15:34 its on both 15:15:43 does winrm have a concept of a persistent open channel? 15:15:46 both client and server need to support it for it to work 15:16:03 bcoca: no, i don't think so, that's why CP works to older SSH's like on EL5 which don't support it at all 15:16:09 WinRM has two different things that we could call "persistent connections" 15:16:16 at worst, it's a KeepAlive setting that can't be modified for network devices 15:16:30 nitzmahone: and they can be managed by the client side? 15:16:39 Not really 15:16:44 jimi|ansible: they dont need to support exact same things/version, but both client and server are involved 15:17:07 as you need to be able to reissue a connection on the existing one (multiplexing iirc, is the requirement on the server side) 15:17:36 so for the paramiko side, we're basically talking about creating the channel and passing that object ref around to other tasks? 15:17:41 That's based on the current networking design that has paramiko baked directly into the modules though- not sure we want to design this feature around that 15:17:53 jtanner: i would say we need to make this more general 15:18:04 docker remote, for example, would also benefit 15:18:08 ansible-ssh-proxy ? 15:18:10 winrm, funcd, etc 15:18:13 (eg, lets' not hamstring ourselves to that model if we could come up with a saner logical connection for networking) 15:18:24 nitzmahone++ 15:18:49 i would say that connection plugins try to find 'cached connection' first and use that, if not, initiate connection and pass to 'cache' 15:19:03 ^ cache would have to handle, invalidation, keepalive, limits, etc 15:19:09 privateip said something like "other projects are already doing this" ... how? 15:19:13 cache would have to live in main process 15:19:27 If we use paramiko as the lowest-common-denominator, we need to have some way to preserve a Python instance with an open socket connection between tasks. 15:19:45 jtanner: his plugins do this already at the module level, we just want to push it into the 'ansible level' 15:19:46 Qalthos: Do you know what/how othe projects are doing this already ( jtanner's comment above) 15:19:52 I don't think pickling/serialization out of the worker will fly in that case 15:20:00 gundalow: I do not 15:20:14 nitzmahone: nope, thinking we pass refs to the socket/connection persistent object 15:20:16 WinRM can work in the same way, and that would be the most efficient way to do persistence there as well 15:20:26 (re control persist, remote side sshd needs to support MaxSessions > 1) 15:20:27 I don't think that will work for WinRM though 15:20:50 (socket anyway, persistent object sure 15:21:20 nitzmahone: it can work for winrm, http call goes to socket instead of remote address (but that is one implementation, that is why i used /) 15:21:41 The issue I see with that without changing anything else is that we'd have to allow our # of open forks to grow to the # of hosts (even if we only use [forks] at a time) 15:22:14 ("that" being the persistent object case) 15:22:16 yes, file handle requirements will go up, but not much more than current usage with control persist 15:22:34 ^ instad of being external to ansible, it will be internal 15:22:58 The other issue with that, at least on the winrm side, is that there has to be something to keep the connection fresh if we're off servicing other hosts 15:23:06 I suspect paramiko would have the same issue 15:23:18 nitzmahone: which is why i mentioned that the 'cache' needs to handle keepalive 15:23:26 ^ conenction shared cache 15:23:41 keepalive, expiration and resource limits 15:24:00 I think the socket-based connection would require cooperation from the underlying system though, no? 15:24:09 (eg, pywinrm would need to understand it) 15:24:52 the underlying open functions should understand socket semantics, i doubt pywinrm is reimplementing those 15:25:07 ^ but that is too low a detail at this point 15:25:23 The "big picture" way I was thinking about this was to remove the worker creation/assignment from TQM and move it to where we do the connection lookup 15:25:48 why? 15:25:50 nitzmahone: i don't think that's necessary 15:25:56 bcoca: it sounds good, but I can't see how it would work 15:26:21 i would rather startup a forked process which workers can talk to directly over a MP queue 15:26:23 jimi|ansible: my other selfish reason for that would be to allow abstraction of the worker creation as a thread instead of a fork (though we could do that at TQM level too) 15:26:26 nitzmahone: dont worry about it for now, jsut mentally subsitute 'socket' for persistent connection object every time i talk 15:26:38 * bcoca likes typing less 15:27:08 I'm not convinced it's going to work for pywinrm, is the only reason I *am* worried about it 15:27:09 nitzmahone: you cannot really do that w/o making ansible thread safe , it isn't by a lot 15:27:27 nitzmahone: again, does NOT need to be a socket, its just easier to say than the alternatives 15:27:55 and pretty sure it can work, the amount of hacking being the only question 15:28:00 persistent connection object == lives in a dedicated Ansible fork though? 15:28:08 main 15:28:10 So one fork per connection 15:28:16 yes, just like variable manager does 15:28:38 Right except we max that out at [forks] today, right? 15:28:39 one fork per task, connections are handled on their own (they can even be threads) 15:28:52 Where is the persistence though? 15:29:10 the max out is cause we applyt it to the task fork, we dont need to apply it to connection forks 15:29:39 But right now the connection object lives in the task fork, yeah? 15:29:47 this is why i mention a 'connection cache', keep a reference there that can be locked and reused 15:30:06 ^ how most connection pooling works, pretty standard 15:30:07 the other idea is we create a ConnectionManager (like we have for VariableManager) and that creates the connection earlier 15:30:17 think 'shared db connection' 15:30:18 That'd be more like what I'd advocate for 15:30:25 Standard connection pooling 15:30:27 jimi|ansible++ for being more articulate 15:30:37 why can't we just try to mimic how control persist works and exec/fork something off into the background that dies on it's own? 15:31:02 jtanner: either way we need a connection manager 15:31:02 rather than juggling crap within the play/tasks 15:31:04 We'd have to roll our own IPC 15:31:06 jtanner: that's what i was thinking of with a background MP process, however it will depend on whether or not thinks like winrm and paramiko are pickleable 15:31:14 Assume they're not 15:31:15 jtanner: we dont need to touch play/tasks 15:31:15 ^ or use shared memory 15:31:24 which is a frigging nightmare area for python 15:31:33 trust me, i looked into it when writing 2.0 15:31:44 nitzamahone: so sort of a task queue and a connection pool, but connections in connections pool are also more or less task consumers 15:31:55 kindof, we just need to pass reference to managed connection and make sure it is locked 15:32:02 write a bin/ssh like thing in paramiko that can setup it's own pipes and Popen that just like we do for ssh 15:32:11 WinRM has another way to do this where we can just snarf a couple of GUIDs- it's less efficient than keeping the persistent HTTP(S) connection open, but it lets us skip a couple round trips for each task 15:32:36 the hard part with a ConnectionManager is that right now we create connections in the TaskExecutor, which is done after all the Playbook* objects are finalized 15:32:39 We'd still have to have refresh/cleanup though, otherwise we risk hitting dead connections and/or DoSing the host. 15:32:46 which we have to wait for, because of delegate to, variables, etc. 15:33:02 jimi|ansible: not hard, they can function mostly like fact cache, check if it exists in sahred, if not create it and pass ref back 15:33:19 the 'connection creation' itself might need to be a 'callback to the main proc' 15:33:20 There'd have to be custom per-connection code that could run in the pool selection stuff so we don't keep too many connections open (real or logical) 15:33:26 ^ bcoca that relies too on the object ref being pickleable 15:33:47 not thinking about objects, but yes if we implement it that way 15:33:49 if it is, it's easier to do this with a shared process that workers talk to over a queue 15:34:07 agreed 15:34:16 that is simplest solution 15:34:17 What's the diff if that process is main ansible or a new shared child? 15:34:24 contention 15:34:28 inheritance 15:34:34 ^ both those things become major issues 15:34:35 the main thread is busy sending jobs out and reading results back 15:34:41 it's already a choke point 15:34:45 * nitzmahone curses Python concurrency 15:34:53 unless we add another background thread like we did for reading results 15:35:16 it did not buy us much with results processing ... 15:35:28 that's 50/50 with a multiprocessing.Process anyway 15:35:59 bcoca: yeah the goal there wasn't performance, it was to ensure we didn't deadlock when all workers were busy and we hit a tight while loop waiting for workers to be available 15:36:34 i think on some systems, the worker wasn't exiting until the result was read back by the main process, maybe some weird system stuff there 15:36:41 never happened on my system/OS 15:36:58 but anyway, like i said that's tomato/tomahto whether it's a thread of MP.Process 15:37:02 s/of/or/ 15:37:05 i still think we should just write a new bin/ssh with paramiko. 15:37:22 That's one of the reasons I was hoping to have Py3.5 only + support (so we could use the new async primitives to simplify some of that kind of stuff) 15:37:24 jtanner: and use same 'external' control persist? 15:37:24 jtanner: we'd have to do so for every connection type 15:37:40 in this case, the urgency is for the network stuff 15:37:47 Yeah, that'd be nasty for error handling 15:37:52 ^ that is one shortcut implementation 15:38:00 well, i guess we'd only need a bin/whatever for each whatever that wanted persistent connections 15:38:00 nitzmahone: same as it is now with cp 15:38:13 so that's a third option 15:38:14 the way you guys are talking, every other idea is a lot of code change 15:38:16 If we're going to reach back to the main process to get a connection, have we just turned the main process into a server? (Liek we were talking about but rejected for optimizing vars?) 15:38:20 Right, but now we have the same suckage with every CP instead of just ssh 15:38:37 jtanner: actually having a background process to save connections is very simple, not much code at all 15:38:38 abadger1999: that's kinda what I was thinking about that 15:38:52 jtanner: yes, it is, using 'bin/hepers' would keep the main codebase teh same 15:39:25 it could actually be the same bin/, like we overload bin/ansible 15:39:33 abadger1999: this is why i keep using 'references', i was hoping teh connections not being that tightly coupled 15:39:38 those "background processes" aren't useful if i'm using bin/ansible 15:39:58 a bin/ssh w/ paramiko could work for that too 15:40:05 jtanner: in that case you're executing a single task anyway, so a persistent connection doesn't help you too much 15:40:10 that being said, too bad we can’t have Tower act as this server on behalf of the engine 15:40:28 thaumos: same serialization issues, and Tower itself would become a choke point 15:40:33 Let's not throw *that* kitchen sink into this pile 15:40:47 * jtanner feels compelled to waste 2 days writing his idea 15:41:04 actually, it does not need to become a chokepoint, having tower deal with persistence is not a bad idea 15:41:17 connection: tower at ansible level, it can just see 'local sockets' 15:41:23 doesn't need to be tower, we could create any process and then it doesn't become tower specific 15:41:26 ^ pushes the complexity out 15:41:47 i think pushing the complexit out is also the attraction of jtanner's idea. 15:41:48 jtanner: The problem I have with abstracting all the connections to a new command-line process is that I lose all the fidelity I have with calling an API right now- my error handling/recovery stuff is reduced to parsing error messages and nastiness 15:41:56 abadger1999: i like both 15:42:15 (just like it is with ssh today- arguably the most difficult to reason about) 15:42:19 nitzmahone: yes, as we do with all connections right now 15:42:27 ? 15:42:28 the diff is that for once, you control the emitter 15:42:39 nitzmahone: if we write the tool, we can shape the output 15:42:48 nitzmahone: although... if we control the code for the cli process some of the parsing problems go away... 15:42:52 nitzmahone: it's not really any different than using an ssh connection via the API today 15:42:53 just like we shape the outputs of modules 15:42:58 Yeah, but you said "an ssh replacement with paramiko"- that doesn't sound like a JSON emitter 15:43:07 could be 15:43:13 does not have to be, but it can 15:43:20 nitzmahone: we'd problems on the order of module stdout parsing rather than on the order of bin/ssh output parsing 15:43:25 So we're basically writing a new module API over stdin/stdout 15:43:26 it would be paired with a new connection plugin 15:43:39 nitzmahone: welcome to unix 15:43:45 -c perssh 15:44:42 so, i think in order, we should try a background process first, and if that doesn't work the CLI option second 15:44:56 so internal bg vs external bg 15:44:59 At least for WinRM, we'd still have to have the ability to run CP-specific code in the controller/connectionmgr to ensure we don't keep too many things open for the same host 15:45:00 and if neither of those work, explore creating a ConnectionManager 15:45:04 what do we do in case of ssh? 15:45:24 external helper doesn't necessarily require talking to stdin/stdout, other ipc is an option 15:45:29 do we allow a toggle to use 'external persistence' vs internal? 15:45:32 nitzmahone: yes we would, we're going to have to do that anyway no matter what option we do 15:45:42 alikins: i would prefer stdout than python ipc 15:45:47 there's going to have to be some limit set on the # of persistent connections allowed 15:45:47 if it were an external process, we wouldn't have to modify the ssh connection plugin at all. 15:46:00 generally for SSH, that's managed on the client side 15:46:04 abadger1999: that is why that option appeals soo much 15:46:19 we'd just be plugging in subprocess calls to the external process inside of individual conenction plugins. 15:46:43 just like we do now for ssh 15:46:56 So you'd basically have a per-CP "connection broker/cache" thing that lives in the controller- the impl could be whatever makes sense for that CP 15:47:07 its not best/most performant, but it is least invasive 15:47:08 So for ssh, use what we have 15:47:16 stdin/stdout to helper means multiplexed comms to helper 15:47:25 For WinRM, manage child processes (or threads) 15:47:48 alikins: as in "a Good Thing" or "a Bad Thing"? 15:47:55 bad 15:48:09 alikins: note -- there's two levels here... 15:48:15 the connection plugin invokes one thing. 15:48:19 alikins: that depends a lot on how you design the helper 15:48:37 and communicates back and forth over stdout 15:49:12 if designed like ssh, helper can communicate with existing process (starting it if needed) 15:49:12 but that thing also needs to setup the thing managing the peristence... that can use some other form of IPC. 15:49:23 then no need to multiplex 15:49:27 15:49:55 a) we all agree we need a 'connection manager' (internal or in external tool) 15:50:11 ^ so pool size, expiration and keepalive will all be dealt with in there 15:51:16 the question now is 'how do we interact with it'? 15:51:19 Keepalive alone makes me think it should probably be its own process- you're going to potentially have a lot of timer contention there 15:51:56 nitzmahone: not really, keepalive is timed, most contention will come from trying to lock a connection for use 15:52:07 Not in Winrm 15:52:13 ^ think db pooling, this is very close to that 15:52:22 if you have everything talking over a queue, you've already got the synchronization you need 15:52:37 So long as nobody starves 15:52:39 jimi|ansible: that would be a point in favor for internal implementation 15:52:46 and the queue doesn't fill up 15:53:04 resource starvation will always be an issue, CM needs to deal with that 15:53:41 those are issues that will arise no matter what solution we go with 15:53:48 Connection selection/creation would have to be a per-CP "key" consisting of whatever unique values are necessary for that CP (user/host/env/become state/random), which also implies that each CP needs to register its pool-key vars. 15:54:28 (which could end up changing our direction for the psuedo-connection-var fix) 15:54:38 depends, some of that can be 'reset' in the connection 15:54:46 should connections (eventually) be shared across 'ansible-*' invocations? parallel invocations? 15:54:48 loginuser/host seems to be the most basic part 15:54:50 For the connection types you know about... 15:55:03 alikins: that's a pro for having an external CLI option 15:55:06 WinRM env *should* be handled at the CP level 15:55:07 with that they could be 15:55:09 alikins: possibly, but that is an optimization that might make more sense in Tower 15:55:23 I hacked it to work like the other stuff for now, but it's *very* limited because of that 15:55:44 (you specify the remote env when you start the shell) 15:55:54 nitzmahone: can you restart a shell? 15:55:58 No 15:56:14 so you need new connection every time you want new shell? 15:56:22 ugh. 15:56:27 ^ that might be big change from how we work now, tasks won't be independant anymore 15:56:31 * abadger1999 tries to figure out how to deal with that... 15:56:47 nitzmahone: almost makes the case to avoid persistence in winrm 15:56:56 We don't *have* to do it that way- it's just more efficient if we do. What we have now works OK 15:56:56 yeah... that seems like a CM can't deal with that. 15:57:33 nitzmahone: is there a way to create a winrm connection to something without a windows host involved? 15:57:43 i'd like to start testing some ideas later 15:57:44 not sure I follow 15:57:47 which makes hte portion of jtanner's idea where the persistence and pooling is handled by something called from the connection plugin rather than something calling the connection plugin seem better. 15:57:56 can i create a connection object without actually connecting to a host? 15:58:03 i want to see if i can pass it around queues at the least 15:58:27 IIRC we don't actually connect until you call CP.connect(), if that's what you mean. 15:58:33 * nitzmahone looks 15:58:44 afaik, connection gets created after task/host info is resolved, need host info to populate connection 15:58:45 although i guess if i want to tell if it really survived without being able to use it on the other side of the queue 15:58:55 bcoca: yes that's what i said earlier 15:59:05 that's why having a ConnectionManager class is tricky 15:59:29 it is, that is why i think the external options are good (either bin/helper or tower) 15:59:56 ok, so our BJ call with privateip is about to begin, do we want to accept the 3 options i said earlier and update Peter? 16:00:19 * abadger1999 is leaning more towards the external helper being the number 1 choice. 16:00:32 abadger1999: you mean the CLI ryansb brought up? 16:00:44 jimi|ansible: I thought it was jtanner 16:00:51 err yeah jtanner my bad 16:00:57 yeah. 16:01:00 k 16:01:02 that's the one 16:01:17 a ThingThatIsCapabaleOfRunningAnsibleCommands (a host/user/shell/module_runtime/env) has a Connection. both may need pools 16:01:38 ie, first is connection+winrm env for ex 16:01:44 (not necessarily ranked just numbered) 1) background managing fork/thread 2) external cli helper 3) ConnectionManager class 16:02:02 there is one issue, the way net modules use connections is very diff than how normal modules do 16:02:18 i don't really want a single process that's running doing all the work of sending out commands 16:02:19 they work more like cloud/api modules 16:02:23 that will serialize things too much 16:03:13 but something that does CP like ssh does (independent, may send something through a pre-existing connection) is fine 16:03:19 can someone list out the 3 options we came up with so it’s easier to parse in notes? 16:03:27 ^ thaumos 16:03:39 jimi just did about 5 messages up 16:03:58 Okay, we're about to start talking with privateip so time to end this meeting. 16:04:02 jimi|ansible: not sure that will work for peter, his modules need the 'serialized version' ... but that might jsut require their own connection plugin and not use the standard ones 16:04:05 hmm, okay. apologies my scrollback is all messed up then 16:04:23 We can post some followup to the agenda ticket, though. 16:04:31 bcoca: i don't think that's a problem, i don't know why he'd need a serialized version, i mean serialized across hosts, not tasks 16:04:41 agreed 16:04:43 nothing should require something serialized across hsots 16:04:53 otherwise they should be doing `serial: 1` :) 16:04:55 ah, no he needs across tasks 16:05:05 not hosts 16:05:49 #endmeeting