19:00:27 #startmeeting ansible core public irc meeting 19:00:27 Meeting started Tue Jul 16 19:00:27 2019 UTC. 19:00:27 This meeting is logged and archived in a public location. 19:00:27 The chair is bcoca. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:27 Useful Commands: #action #agreed #halp #info #idea #link #topic. 19:00:27 The meeting name has been set to 'ansible_core_public_irc_meeting' 19:00:33 #topic open floor 19:00:44 o/ 19:00:53 o/ 19:02:46 * bcoca makes note to buy crickets 19:03:15 if nothing new, closing in 8 mins 19:04:23 \o 19:04:26 #info ~6 weeks until beta freeze 19:04:38 ^ still accurate? 19:07:18 hey 19:07:21 afaik 19:08:41 since its quiet. Anybody know of a PR in the works along the lines of a general purpose timeout for tasks, or whether its been tried and failed in the past. 19:08:59 oops forgot the ? at the end of my question. 19:09:06 lots of timeout discussions, mainly they now work at connection level 19:09:18 we also have 'facts timeout' but still has issues with blocking operations on target 19:09:25 for 'task timeout' you really have async tasks 19:09:51 that's good if you know its coming. 19:10:40 I hit a case today where a 'pause' just hung. 2 minute timeout, I got bored waiting after 10 mins. 19:10:47 if we do it by default, tasks that take very long will timeout, if you dont know its comming you would not set timeout 19:11:04 oh it would have to be opt in 19:11:15 i would say, async can deliver that as of now 19:11:34 not with action plugins like 'pause', though 19:12:06 no, but action plugins freezing is a problem on controller 19:12:17 and pause has it's own timeouts already as options 19:12:47 I can see the benefit of a global task timeout option that defaults to unset 19:12:50 if I was getting fancy I'd want some kind of 'hey' your task hasn't progressed in x minutes' and then a global 'its been an _configurable period_ giving up 19:13:14 oops missed the word 'warning' before the 'and' above. 19:13:38 just curious if anyone had heard of anything similar in the works 19:14:01 i've heard 'intentions' of such a thing for a long time, but not even a hint of a PR 19:14:10 i've asked for it 19:14:21 https://github.com/ansible/ansible/pull/57818/files is the closest to it so far 19:15:17 WIP *and* janky :) 19:15:19 * jhawkesworth reads PR 19:15:22 ^ i had not seen that 19:16:08 but that still relies on teh connection 19:16:40 but better than existing that relies on the 'protocol' mostly 19:16:56 that still wont affect action plugins 19:17:43 i was thinking jimi-c's process plugins pr might be one way, having a 'timed forked' one would be good way to implement this for 'all tasks' 19:18:49 we'll forever be fighting weird things at the connection layer, the network layer, the module layer, etc 19:19:03 i've been trying to advocate for an optional worker level timeout 19:19:21 true, but at worker level makes more sense 'total task time', at llexec .. we do 4-12 of those depending on the action 19:19:47 jtanner: that is what im agreeing with, and very easy to swapin/test when we have 'process plugins' 19:20:07 only supplying my reasoning 19:20:14 what *isn't* there going to be a plugin type for when bcoca is finished? 19:20:19 my first thought was something along closer to the task execution loop, but can't comment meaningfully on implementation. 19:20:26 agaffney: smart mouthes 19:20:48 one issue is that we already have a `timeout` keyword and it refers to the 'protocol timeout' 19:21:15 jhawkesworth: the 'process/worker' would be at that level 19:22:17 * jhawkesworth trying to avoid temptation to bikeshed name for thing that doesn't exist 19:22:27 heh 19:22:40 timelimit 19:22:46 endofworld 19:22:52 agmaggedonclock 19:22:56 ragnarok 19:23:23 and here all I had was "task_timeout" 19:23:45 thedoctor 19:24:07 background is I'm getting asked to make certain playbooks more reliable. 19:24:08 i really dislike task_ for 'task keywords' 19:24:35 jhawkesworth: what problems are you running into where this feature would help reliability? 19:25:36 in short, automated releases to QA. 19:25:56 a hang on 'pause' sounds like a bug in ansible that you can't do much about from your playbook, or maybe you did something silly like `minutes: 120` instead of `seconds: 120` :) 19:26:36 reasons for playbooks to fail are many and varied - developer error; windows is busy doing something else. people leaving things logged on. 19:27:06 yeah that pause one is totally weird. It must have run 100 times successfully, but something upset it this time. 19:27:28 we could solve the task-level attribute name issue by just making this a global config option. as bcoca said earlier, you can already achieve this on a per-task basis with 'async' 19:27:56 definitely needs to be optional and null by default 19:28:07 +1 to that 19:28:17 agaffney: also a global named timeout ... 19:29:55 nag jimi|ansible about his process plugins, then we can easily implement that 19:30:28 are process plugins for things like threads vs. fork for workers? 19:30:44 yes, so 'timed' ones seems like a good plugin to use 19:32:24 I guess as long as it doesn't slow down task exec loop that sounds like a nice way to get what I'm after. 19:33:21 or we can just hijack current timeout, change the meaning and add 'per connection plugin timeouts (they are already there)' 19:35:20 hmm, timeout on its own is kinda vague. Could be connecting to host time out, response from host timeout. 19:35:55 he, each protocol can have N timeouts 19:36:11 auth/tcp/keepalive/total connection time/time to connect/time to socket/etc 19:36:21 jhawkesworth: all the more reason to hijack it with an overall task timeout 19:36:33 why i think 'current' timeout make smore sense as 'task timeout' .. its what most people assume anyways 19:37:06 oh all right I'll bikeshed some names... 19:37:32 `action_timeout` perhaps? 19:37:57 or does that make it sound like it only applies to actions 19:38:52 to be fair, they are always an 'action' and we've mislead most to think of them as 'modules' but in this channel we know its actually a combination 19:39:13 action: /local_action: are the actual underlying keywords for a task action 19:39:28 and everything uses the 'normal' action by default 19:39:43 if no action plugin is matched 19:40:58 hmm if the 'normal' action plugin were a configurable thing, perhaps the normal action could be a timed_normal action, if you see what i mean 19:41:44 not sure it buys us anything over process plugins idea though 19:41:47 well, you can do that and override teh behaviour easily, then just create your own execute_module that has a timeout, but that wont cover other actions (pause, service, etc) 19:42:09 process plugins cover ALL actions that are not hardcoded (meta, add_host, group_by ..) 19:42:11 ah yeah of course.. so .. higher level than actions really 19:43:46 Having action timeouts is great, but cancelling things isn't always possible, depending on connection and the operation that's blocking... Things get a lot more complex with threads, too, since they're not generally safe to abort. 19:44:12 So unless the timeout is fatal, recovering gracefully is often not possible 19:44:47 I was thinking it would be fatal tbh 19:45:02 I mean controller fatal, not just task fatal 19:45:36 I was thinking controller fatal. 19:46:48 task fatal would be nice actually.. mail plugin makes for 'total perspective vortex' on failures. 19:48:00 well good to chat it through. I can make more use of async and `wait_for..` 19:48:23 Yeah, without process isolation, it's nearly impossible to do reliably recoverable preemptive action timeouts. 19:48:51 Cooperative, sure, and maybe that's enough in many cases 19:49:41 That's a whole lot easier to do under py3 19:49:44 nitzmahone: why i suggested this as ONE process plugin, timed_forkes, which would allow user to choose which feature best suits them 19:50:13 you can do similar with threads but its not as easy to abandon them 19:50:29 you can still get stuck on 'cleanup' 19:51:43 Not just that either- a thread abort can leave hanging locks and other things that aren't obvious until you hang or deadlock at any arbitrary point later 19:52:04 So preemptive timeout in process isn't really possible 19:52:11 its possible, just not advisable 19:52:12 (only cooperative) 19:52:54 cooperative is reasonable when you own both sides 19:53:02 but .. plugins! 19:53:17 we can ensure the plugings we control.. but soo many we don't 19:54:06 Hmm, well I'll think more about what I actually need. First thought was pretty basic 'do_within_timeout or die' but its clear there's more to it. 19:54:39 tis why Python doesn't have a managed external thread abort- most languages that *do* have them are like "uhh, yeah, if you use this, you basically need to tear down your process" 19:54:43 there are many interpretations of 'die' 19:55:11 nitzmahone: it does have an 'abandon' .. but as i said, not really 19:55:14 * jtanner starts to wonder if this could be hacked together with a bash script around the ansible cli 19:55:27 jtanner: timout ansible-playbook .... 19:55:34 timeout 19:55:36 that kills the master 19:55:49 something to look for child pids, and kill the child if too old 19:55:55 ^ actually . ansible_ssh_executable: timeout -n 10 ssh 19:56:20 ^ not that, but wrapper that does that 19:56:43 bcoca: yeah, I actually played with using timeout in the connection plugin on that janky LLEC timeout PR 19:57:07 nitzmahone: i saw, same issue its not 'task level' .. it should still work for some cases 19:57:27 plenty of my playbooks don't use ssh but yeah shell script wrapper might get me out of this particular hole. 19:58:12 ^ playbook timeout, play timeout, role timeout, task timeout ... many timeouts .. and ive not gotten to connection nor 'module execution' 19:58:31 since perl was my first language, my definition of die is pretty much https://perldoc.perl.org/functions/die.html 19:59:07 he, that is mine too ... but others differ on what death means an dhow to handle funeral expenses 19:59:21 :-) 19:59:27 5.30 .. wow 19:59:28 thanks for chatting it through. 19:59:43 time for windows working group though so .. cheers 19:59:48 np, anytime, you can start these in devel also, not really what meetings are normally used for 19:59:53 glwt 19:59:56 #endmeeting