20:01:17 <jborean93> #startmeeting Ansible Windows Working Group
20:01:17 <zodbot> Meeting started Tue Oct 16 20:01:17 2018 UTC.
20:01:17 <zodbot> This meeting is logged and archived in a public location.
20:01:17 <zodbot> The chair is jborean93. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:01:17 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
20:01:17 <zodbot> The meeting name has been set to 'ansible_windows_working_group'
20:01:27 <jborean93> hey all, I won't be here for long but will see who turns up
20:01:38 <dag> o/
20:01:44 <jborean93> #chair dag
20:01:44 <zodbot> Current chairs: dag jborean93
20:01:45 <jborean93> hey
20:01:54 <dag> we have many topics, but nothing is really urgent :-)
20:02:01 <dag> so I don't mind to defer this meeting
20:02:11 <jborean93> will see if anyone else turns up
20:02:36 <nitzmahone> yo
20:02:53 <jborean93> #chair nitzmahone
20:02:53 <zodbot> Current chairs: dag jborean93 nitzmahone
20:02:55 <nitzmahone> Sorry, got sidetracked in another convo
20:03:17 <itpraktyk> hi
20:04:28 <jborean93> #chair itpraktyk
20:04:28 <zodbot> Current chairs: dag itpraktyk jborean93 nitzmahone
20:04:47 <jborean93> fantastic we have a few people
20:05:18 <jborean93> #topic https://github.com/ansible/community/issues/294#issuecomment-426865988 win_reboot waiting
20:05:22 <jborean93> dag I believe this is yours
20:05:44 <dag> Yup, the problem is easy, we have a default boot timeout in win_reboot
20:06:05 <dag> and in some cases (when there are outstanding updates, that may take a long time) it can cause the playbook to fail
20:06:30 <jborean93> the problem is easy?
20:06:31 <dag> so maybe we should have some way to mitigate these (unexpected) failures
20:06:37 <dag> easy to describe :)
20:06:45 <jborean93> about to say
20:07:21 <dag> so maybe we could detect (before booting) there are outstanding updates, and we have a separate boot_timeout for outstanding updates
20:07:40 <nitzmahone> We've not come up with a proper way to do that, unfortunately
20:08:14 <nitzmahone> I've poked and prodded lots of people at Microsoft, but nobody's come up with a definitive "here's where you check to see if we're applying updates"
20:08:55 <jborean93> the alternative is to have a module that executes Windows side that checks for known problematic solutions post reboot
20:09:06 <dag> nitzmahone: then we should add at least a note/example on how to mitigate it as-is, but I would like to keep this somewhere tracked (maybe an open issue)
20:09:14 <jborean93> we currently run a manual command in win_updates during the reboot phase for this but can extend it once we figure out more scenarios
20:09:49 <nitzmahone> Obviously there's *some* system object that logonui et al are checking to show the "Applying Updates, Don't Turn Off Your Computer" notice, but nobody will tell us if/how to access them
20:10:04 <nitzmahone> The IsBusy check is the best we have
20:10:06 <jborean93> probably no stable interface that changes between versions
20:10:21 <nitzmahone> We can't do the IsBusy check without become though
20:10:44 <weq> ehm there are api's for this afaik.
20:11:40 <weq> https://docs.microsoft.com/en-us/windows/desktop/api/wuapi/nn-wuapi-iupdatesession
20:11:46 <nitzmahone> We're all ears if you know what. The WU client API is the best thing we've found to date, and it's not definitive, and also can't be accessed normally through WinRM- we have to create an interactive session via become to get at it, which we don't want to add as a prereq for win_reboot.
20:12:23 <nitzmahone> That's how win_updates does it, but adding that to win_reboot is a probable non-starter
20:12:37 <jborean93> it also doesn't expose whether updates are being installed (that we know off), the best thing we have is using `IsBusy()` but as nitzmahone says that requires become
20:12:41 <weq> we use this PS script slightly tweaked to check additional sources. I can see if I can dig up the exact one we've implemented for SMA.
20:13:00 <weq> https://gallery.technet.microsoft.com/scriptcenter/0dbfc125-b855-4058-87ec-930268f03285 we use a modified version of that.
20:13:45 <jborean93> that's a bit different, I think dag's talking about checking if there are updates that have been installed and will do further actions after a reboot
20:13:50 <nitzmahone> Yeah, that's a much simpler version of what win_updates is doing
20:14:11 <nitzmahone> It doesn't tell you that there are updates running, either, just that there are some available
20:14:12 <weq> cause there are atleast 3 registry locations that indicate if a reboot is pending also
20:14:24 <nitzmahone> This is a different issue
20:14:29 <weq> oh in that sense ok I'm with you.
20:14:31 <weq> nvm then
20:15:23 <nitzmahone> This is the post-update "winrm is available, but system is still applying updates" thing (when it won't let you log in interactively with the "Applying Updates" spinning toilet bowl of death)
20:16:20 * jborean93 will have to use spinning toilet bowl of death more often
20:17:54 <dag> nitzmahone: it's also to avoid that we hit the 600 seconds timeout because of updates, which may break the workflow unexpectedly
20:18:25 <jborean93> are you coming across this often, I've yet to hit the 600 second timeout unless I was installing on a fresh Windows install
20:19:03 <jborean93> and even then I believe it was only for 2016 which seems to just take ages to install the cumulative update from the base install
20:19:07 * nitzmahone suspects slow hardware and magnetic disks
20:19:18 <jborean93> true, I do had ssds
20:20:12 <weq> I can't remember if there are any events to react to either. Or if they are only available after sysmon is installed and configured.
20:20:56 <nitzmahone> We've not found anything that catches that case.
20:21:12 <nitzmahone> (and nobody at Microsoft has been forthcoming with the deets)
20:22:25 <nitzmahone> dag: just to be clear- given that we can't definitively tell that there are updates running from win_reboot today, is the problem that you can't increase the timeout, or something else?
20:25:31 <dag> nitzmahone: it's about predictability
20:25:57 <dag> nitzmahone: sure, I can set a high timeout on every win_reboot task, but that's an ugly workaround
20:26:14 <nitzmahone> Until Microsoft coughs up some details, I'm not sure what else we can do
20:26:22 <dag> so I think we should document it, maybe keep an issue open (for people to hit/discuss) and see over time
20:26:25 <dag> sure
20:26:46 <nitzmahone> That works for me- maybe a note on the win_reboot docs about it
20:27:27 <nitzmahone> At least if I have an active issue open, when I find a new person at Microsoft to hassle about it, I can point to it and say "look at all the people suffering because of this, tell us how to check it!"
20:27:38 <weq> ah now I remember how I do with other automation solutions atm, we check for the state on the RPC service.
20:29:01 <nitzmahone> Hrm, I'd be surprised if that covers it
20:29:17 <nitzmahone> There (used to be anyway) updates  that required that to be running and accessible
20:30:12 <weq> or whatever of the mandatory services that might be started last perhaps.
20:31:28 <nitzmahone> The SCM just does its normal thing AFAICT; Automatic start services (including WinRM) are started normally, and IIRC so are Delayed Start, but that's something we can probably verify
20:32:06 <nitzmahone> I'd be surprised if any of the "non-controllable" services aren't started by then either, because just about everything relies on them
20:34:15 <nitzmahone> #action nitzmahone to add note to win_reboot docs re: post-reboot updates
20:36:19 * jborean93 I've got to head off, have a good rest of the meeting
20:36:58 <nitzmahone> WRT auto-regathering facts after reboot, I'd prefer the manual approach rather than some sort of magic (eg, a conditional `setup` task if win_reboot registered a change), fine if someone wants to add an example to `win_reboot` examples for that
20:38:48 <dag> nitzmahone: I will do this, add notes about this
20:39:17 <nitzmahone> #action dag to add post-reboot conditional example to win_reboot
20:39:32 <nitzmahone> *conditional setup
20:39:41 <nitzmahone> thanks!
20:39:57 <nitzmahone> Is mcassiniti here to talk about https://github.com/ansible/ansible/pull/45708?
20:41:49 <nitzmahone> OK, next thing:
20:41:51 <nitzmahone> #topic https://github.com/ansible/community/issues/294#issuecomment-428202744
20:42:04 <nitzmahone> (retries in pywinrm/pypsrp)
20:43:22 <nitzmahone> I'm not opposed to these changes, but IMO they need to default to "no retry" and Ansible can allow passing through a retry configuration; if someone's having those problems, they can enable retries, but in general, automatic retry papers over real problems, so I really dislike having it on by default.
20:43:29 <dag> Yup, I think we can finetune this
20:43:30 <nitzmahone> (as do most others I'm aware of)
20:44:06 <dag> the intention was to make it so there is no risk
20:44:35 <nitzmahone> Sure, but on by default also makes it so you lose "fail fast" on a misconfigured connection
20:44:38 <dag> and the only downside now (because a connection problem is pretty generic) that on SSL errors, it retries 4 times
20:45:26 <dag> we can finetune what happens on connection errors so we can separate a Connection Refused, and No route to host from an SSL connection error
20:45:45 <dag> like my original implementation, would retry only on connection refused errors
20:46:16 <dag> but we implemented the retry ourselves, while the Requests retry mechanism is much more sane (with a backoff period)
20:46:32 <dag> my personal need is only the Connection refused case
20:47:09 <dag> which happens sporadically (i.e. when installing specific Microsoft products)
20:47:29 <dag> my concern is that people who don't know this, will not know what is going on, causing support overhead
20:47:48 <dag> so I don't mind reducing the implementation to only Connection refused
20:48:21 <dag> but the current implementation is safe, in the sense we don't retry accepted (or possibly accepted) payload
20:48:33 <nitzmahone> I'd definitely be more comfortable with that, but still not on by default
20:48:51 <dag> ok, I think it's a mistake to not enable it by default
20:49:06 <dag> but I am not the concern here, it's the other users :-)
20:49:17 <nitzmahone> I've been writing networked systems for decades now, and automatic retry is always something that ends up biting me in the ass. I'm tired of the teeth marks. :)
20:49:24 <dag> so not being the default is fine by me
20:50:03 <nitzmahone> Yeah, so long as we document it in a "troubleshooting" section or something, I think that's a good balance
20:50:17 <dag> well, WinRM/PSRP is not always reliable, without this our workflows would fail more often then they work
20:50:30 <dag> so in an Enterprise environment that's what I would expect
20:50:54 <dag> we'll see
20:51:25 <nitzmahone> It's not to say we can't make it the default behavior at some point either, but yeah, available is better than not
20:51:59 <nitzmahone> The other nasty bit about the impl is that you can't assume urllib3 is vendored
20:52:26 <nitzmahone> (eg, in my employer's OS packages of requests, it's not)
20:52:57 <dag> What do you mean ?
20:53:26 <nitzmahone> In most OS-packaged versions of requests, urllib3 is a top-level Python package, not embedded inside requests
20:53:35 <nitzmahone> Only the pip-distributed version does it that way IIRC
20:54:00 <nitzmahone> If you try that with a yum-installed version of requests, the import of requests.packages.urllib3 will fail IIRC
20:54:30 <nitzmahone> So you have to play some conditional import games to figure out where urllib3 lives
20:55:37 * nitzmahone rolls eyes and grumbles about Python packaging to self
20:56:46 <nitzmahone> So we're running short on time; anything else pressing for this week?
20:57:31 <itpraktyk> #46516 ?
20:57:40 <itpraktyk> https://github.com/ansible/ansible/pull/46516
20:58:48 <nitzmahone> At a glance, looks like great additions
20:59:18 <nitzmahone> We'll try to review this week
20:59:32 <itpraktyk> OK, thx
20:59:57 <nitzmahone> OK, that's time for today. Thanks all!
21:00:01 <itpraktyk> I'll try take a look at https://github.com/ansible/ansible/issues/28758
21:00:24 <nitzmahone> cool thanks
21:00:34 <nitzmahone> Until next week...
21:00:39 <nitzmahone> #endmeeting