20:01:17 #startmeeting Ansible Windows Working Group 20:01:17 Meeting started Tue Oct 16 20:01:17 2018 UTC. 20:01:17 This meeting is logged and archived in a public location. 20:01:17 The chair is jborean93. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:01:17 Useful Commands: #action #agreed #halp #info #idea #link #topic. 20:01:17 The meeting name has been set to 'ansible_windows_working_group' 20:01:27 hey all, I won't be here for long but will see who turns up 20:01:38 o/ 20:01:44 #chair dag 20:01:44 Current chairs: dag jborean93 20:01:45 hey 20:01:54 we have many topics, but nothing is really urgent :-) 20:02:01 so I don't mind to defer this meeting 20:02:11 will see if anyone else turns up 20:02:36 yo 20:02:53 #chair nitzmahone 20:02:53 Current chairs: dag jborean93 nitzmahone 20:02:55 Sorry, got sidetracked in another convo 20:03:17 hi 20:04:28 #chair itpraktyk 20:04:28 Current chairs: dag itpraktyk jborean93 nitzmahone 20:04:47 fantastic we have a few people 20:05:18 #topic https://github.com/ansible/community/issues/294#issuecomment-426865988 win_reboot waiting 20:05:22 dag I believe this is yours 20:05:44 Yup, the problem is easy, we have a default boot timeout in win_reboot 20:06:05 and in some cases (when there are outstanding updates, that may take a long time) it can cause the playbook to fail 20:06:30 the problem is easy? 20:06:31 so maybe we should have some way to mitigate these (unexpected) failures 20:06:37 easy to describe :) 20:06:45 about to say 20:07:21 so maybe we could detect (before booting) there are outstanding updates, and we have a separate boot_timeout for outstanding updates 20:07:40 We've not come up with a proper way to do that, unfortunately 20:08:14 I've poked and prodded lots of people at Microsoft, but nobody's come up with a definitive "here's where you check to see if we're applying updates" 20:08:55 the alternative is to have a module that executes Windows side that checks for known problematic solutions post reboot 20:09:06 nitzmahone: then we should add at least a note/example on how to mitigate it as-is, but I would like to keep this somewhere tracked (maybe an open issue) 20:09:14 we currently run a manual command in win_updates during the reboot phase for this but can extend it once we figure out more scenarios 20:09:49 Obviously there's *some* system object that logonui et al are checking to show the "Applying Updates, Don't Turn Off Your Computer" notice, but nobody will tell us if/how to access them 20:10:04 The IsBusy check is the best we have 20:10:06 probably no stable interface that changes between versions 20:10:21 We can't do the IsBusy check without become though 20:10:44 ehm there are api's for this afaik. 20:11:40 https://docs.microsoft.com/en-us/windows/desktop/api/wuapi/nn-wuapi-iupdatesession 20:11:46 We're all ears if you know what. The WU client API is the best thing we've found to date, and it's not definitive, and also can't be accessed normally through WinRM- we have to create an interactive session via become to get at it, which we don't want to add as a prereq for win_reboot. 20:12:23 That's how win_updates does it, but adding that to win_reboot is a probable non-starter 20:12:37 it also doesn't expose whether updates are being installed (that we know off), the best thing we have is using `IsBusy()` but as nitzmahone says that requires become 20:12:41 we use this PS script slightly tweaked to check additional sources. I can see if I can dig up the exact one we've implemented for SMA. 20:13:00 https://gallery.technet.microsoft.com/scriptcenter/0dbfc125-b855-4058-87ec-930268f03285 we use a modified version of that. 20:13:45 that's a bit different, I think dag's talking about checking if there are updates that have been installed and will do further actions after a reboot 20:13:50 Yeah, that's a much simpler version of what win_updates is doing 20:14:11 It doesn't tell you that there are updates running, either, just that there are some available 20:14:12 cause there are atleast 3 registry locations that indicate if a reboot is pending also 20:14:24 This is a different issue 20:14:29 oh in that sense ok I'm with you. 20:14:31 nvm then 20:15:23 This is the post-update "winrm is available, but system is still applying updates" thing (when it won't let you log in interactively with the "Applying Updates" spinning toilet bowl of death) 20:16:20 * jborean93 will have to use spinning toilet bowl of death more often 20:17:54 nitzmahone: it's also to avoid that we hit the 600 seconds timeout because of updates, which may break the workflow unexpectedly 20:18:25 are you coming across this often, I've yet to hit the 600 second timeout unless I was installing on a fresh Windows install 20:19:03 and even then I believe it was only for 2016 which seems to just take ages to install the cumulative update from the base install 20:19:07 * nitzmahone suspects slow hardware and magnetic disks 20:19:18 true, I do had ssds 20:20:12 I can't remember if there are any events to react to either. Or if they are only available after sysmon is installed and configured. 20:20:56 We've not found anything that catches that case. 20:21:12 (and nobody at Microsoft has been forthcoming with the deets) 20:22:25 dag: just to be clear- given that we can't definitively tell that there are updates running from win_reboot today, is the problem that you can't increase the timeout, or something else? 20:25:31 nitzmahone: it's about predictability 20:25:57 nitzmahone: sure, I can set a high timeout on every win_reboot task, but that's an ugly workaround 20:26:14 Until Microsoft coughs up some details, I'm not sure what else we can do 20:26:22 so I think we should document it, maybe keep an issue open (for people to hit/discuss) and see over time 20:26:25 sure 20:26:46 That works for me- maybe a note on the win_reboot docs about it 20:27:27 At least if I have an active issue open, when I find a new person at Microsoft to hassle about it, I can point to it and say "look at all the people suffering because of this, tell us how to check it!" 20:27:38 ah now I remember how I do with other automation solutions atm, we check for the state on the RPC service. 20:29:01 Hrm, I'd be surprised if that covers it 20:29:17 There (used to be anyway) updates that required that to be running and accessible 20:30:12 or whatever of the mandatory services that might be started last perhaps. 20:31:28 The SCM just does its normal thing AFAICT; Automatic start services (including WinRM) are started normally, and IIRC so are Delayed Start, but that's something we can probably verify 20:32:06 I'd be surprised if any of the "non-controllable" services aren't started by then either, because just about everything relies on them 20:34:15 #action nitzmahone to add note to win_reboot docs re: post-reboot updates 20:36:19 * jborean93 I've got to head off, have a good rest of the meeting 20:36:58 WRT auto-regathering facts after reboot, I'd prefer the manual approach rather than some sort of magic (eg, a conditional `setup` task if win_reboot registered a change), fine if someone wants to add an example to `win_reboot` examples for that 20:38:48 nitzmahone: I will do this, add notes about this 20:39:17 #action dag to add post-reboot conditional example to win_reboot 20:39:32 *conditional setup 20:39:41 thanks! 20:39:57 Is mcassiniti here to talk about https://github.com/ansible/ansible/pull/45708? 20:41:49 OK, next thing: 20:41:51 #topic https://github.com/ansible/community/issues/294#issuecomment-428202744 20:42:04 (retries in pywinrm/pypsrp) 20:43:22 I'm not opposed to these changes, but IMO they need to default to "no retry" and Ansible can allow passing through a retry configuration; if someone's having those problems, they can enable retries, but in general, automatic retry papers over real problems, so I really dislike having it on by default. 20:43:29 Yup, I think we can finetune this 20:43:30 (as do most others I'm aware of) 20:44:06 the intention was to make it so there is no risk 20:44:35 Sure, but on by default also makes it so you lose "fail fast" on a misconfigured connection 20:44:38 and the only downside now (because a connection problem is pretty generic) that on SSL errors, it retries 4 times 20:45:26 we can finetune what happens on connection errors so we can separate a Connection Refused, and No route to host from an SSL connection error 20:45:45 like my original implementation, would retry only on connection refused errors 20:46:16 but we implemented the retry ourselves, while the Requests retry mechanism is much more sane (with a backoff period) 20:46:32 my personal need is only the Connection refused case 20:47:09 which happens sporadically (i.e. when installing specific Microsoft products) 20:47:29 my concern is that people who don't know this, will not know what is going on, causing support overhead 20:47:48 so I don't mind reducing the implementation to only Connection refused 20:48:21 but the current implementation is safe, in the sense we don't retry accepted (or possibly accepted) payload 20:48:33 I'd definitely be more comfortable with that, but still not on by default 20:48:51 ok, I think it's a mistake to not enable it by default 20:49:06 but I am not the concern here, it's the other users :-) 20:49:17 I've been writing networked systems for decades now, and automatic retry is always something that ends up biting me in the ass. I'm tired of the teeth marks. :) 20:49:24 so not being the default is fine by me 20:50:03 Yeah, so long as we document it in a "troubleshooting" section or something, I think that's a good balance 20:50:17 well, WinRM/PSRP is not always reliable, without this our workflows would fail more often then they work 20:50:30 so in an Enterprise environment that's what I would expect 20:50:54 we'll see 20:51:25 It's not to say we can't make it the default behavior at some point either, but yeah, available is better than not 20:51:59 The other nasty bit about the impl is that you can't assume urllib3 is vendored 20:52:26 (eg, in my employer's OS packages of requests, it's not) 20:52:57 What do you mean ? 20:53:26 In most OS-packaged versions of requests, urllib3 is a top-level Python package, not embedded inside requests 20:53:35 Only the pip-distributed version does it that way IIRC 20:54:00 If you try that with a yum-installed version of requests, the import of requests.packages.urllib3 will fail IIRC 20:54:30 So you have to play some conditional import games to figure out where urllib3 lives 20:55:37 * nitzmahone rolls eyes and grumbles about Python packaging to self 20:56:46 So we're running short on time; anything else pressing for this week? 20:57:31 #46516 ? 20:57:40 https://github.com/ansible/ansible/pull/46516 20:58:48 At a glance, looks like great additions 20:59:18 We'll try to review this week 20:59:32 OK, thx 20:59:57 OK, that's time for today. Thanks all! 21:00:01 I'll try take a look at https://github.com/ansible/ansible/issues/28758 21:00:24 cool thanks 21:00:34 Until next week... 21:00:39 #endmeeting