15:03:16 #startmeeting Ansible Core IRC meeting 15:03:16 Meeting started Thu Sep 6 15:03:16 2018 UTC. 15:03:16 This meeting is logged and archived in a public location. 15:03:16 The chair is ryansb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:03:16 Useful Commands: #action #agreed #halp #info #idea #link #topic. 15:03:16 The meeting name has been set to 'ansible_core_irc_meeting' 15:03:29 #chair alikins mkrizek shertel sdoran 15:03:29 Current chairs: alikins mkrizek ryansb sdoran shertel 15:03:49 welcome, today's agenda: https://github.com/ansible/community/issues/347 15:04:07 There's only one issue on here, which is https://github.com/ansible/ansible/pull/35636 15:04:18 is sethp-nr around? 15:04:26 Good morning! 15:04:34 wasaboutto #startmeeting 15:04:41 #chair bcoca 15:04:41 Current chairs: alikins bcoca mkrizek ryansb sdoran shertel 15:05:05 #link https://github.com/ansible/ansible/pull/35636 15:05:14 Good old systemd. 15:05:19 sethp-nr: im aware of your PR, but also looking at several others, the problem with systemd is that it keeps changing over time 15:05:34 the installed base does not respond uniformly to state/rc/command expectations 15:05:46 also looking at big revamp with https://github.com/ansible/ansible/pull/40441 15:05:53 which might help a bit with the consistency 15:05:54 .hello2 15:05:55 maxamillion: maxamillion 'Adam Miller' 15:06:27 https://github.com/ansible/ansible/pull/35636 <= also looking how to integrate waits to get more consistent 'statusi' 15:08:41 I’m looking at 40441 now; it looks like it’s certainly in a related space but I’m not sure it addresses our issue 15:10:07 my point is that none of them address the full issue, inconsistent data returned from the tool itself 15:10:16 I’m happy to give it a try, though, and rebase whatever parts of 35636 remain important on top of that work 15:10:23 i was looking to see if directly using dbus was better .. but that ends up having the same problem 15:10:41 Ola 15:10:58 I would expect that to get worse, since now we’d be exposed to version differences in the request/response protocol 15:10:59 so i'm left with a couple of choices, 'versoin dependant response tracking' ... or try to force things to a 'known common state' 15:11:13 sethp-nr: yep, my hopes were crushed 15:12:18 ^ unless someone else has a suggestion, cause i really dont see a good solution at this point, which is also why i've been remis to merge many of these as they compound the problem and fix 'a case' while breaking others 15:13:01 What you’re describing certainly matches our experience with systemd 15:13:09 and systemd itself is not only problem, the debian implementation diverges from the expectation .. then also diverges per version ... 15:13:17 As unfortunate as it is, not trusting the tool and tracking our own state seems to be the only reliable solution. 15:13:26 that is my fear 15:13:29 Is there something we can do from the testing side to reduce the risk? 15:13:35 and i've been trying to find alternatives 15:13:58 gundalow: add more distros with more setups .. but that ends up being a bit unreasonable 15:14:03 I can do some more digging into 40441, but my goal with 35636 was to go the opposite way – push the state management down into systemd itself, since the one part of the interface that seems moderately consistent is the behavior of the `systemctl start/stop` command 15:14:21 but you’ve maybe found cases where that isn’t true 15:14:23 debian jessie (previous stable/8) is a good canary, but its not the only divergent 15:14:34 gundalow: The matrix gets more complex since different versions of systemd on the same distribution can behave differently. 15:14:53 sethp-nr: yep, specially when you have dependant/chained services .. it gets 'fun' 15:16:10 Does “fun” here imply duplicating / reversing queued jobs because dependency actions are racing with the execution of the module? 15:17:10 ^ in some cases .. in others you have to deal with triggers that are not as easy to track 15:17:12 it sounds like we have to do version checking in the end :-( 15:17:36 Fragile and inelegant but if systemd upstream doesn't give us any better options.... 15:17:42 abadger1999: not only version, version + distro .. as different 'implementation' also change behaviour, specially when systemd is managing init scripts 15:17:46 Yeah. 15:18:24 sethp-nr: its not that i think your PR is bad .. i think we need to deal with fundamental problem first and right now im not sure either path will solve it in a good way 15:18:45 i think your PR mitigates many of the issues in many cases .. but breaks others .. due to the underlying inconsistencies 15:19:02 i feel like im playing jenga with chainsaws during an earthquake 15:20:04 that said, sethp-nr im willing to try to integrate your PR and see what happens, but in devel for now as i would not want to affect stable release ... prefer 'known problems' than 'new problems' for that 15:20:29 Yeah, that makes sense 15:21:07 and if anybody has a better idea than the 2 we are contemplating ... i really really want to hear it 15:21:24 * ryansb not nearly expert enough on systemd 15:21:32 cause version/implementation tracking and 'side state management' are not good solutions, they are fragile and a ton of work with very bad returns 15:21:40 ryansb: live happier 15:22:00 im not an expert either, just dealing with CLI returns is scary enough 15:22:12 also looked at dbus messaging .. gets worse ... 15:22:27 It's maddening how inconsistent the returned data are. 15:23:06 my knuckles have bled more than once due to that ... i have 'old timey brick n plaster walls' .. not crappy sheetrock 15:23:33 Unix tools used to be nice and predictable. (I'm sure there are exceptions) 15:24:20 i would not call systemd a unix tool ... 15:24:29 Sorry, internet flubbed 15:24:30 but yeah, many tools have done weird things 15:24:35 sethp-nr: np 15:24:43 just venting at this point ... 15:25:32 That’s an important stage of “systemd”, I think 15:25:45 im still unsure of which route to take ... part of me desperatly hopes somene else figures out something i'm missing ... 15:25:58 abadger1999: you seem to lean towards version/implementaiton tracking 15:26:40 Yeah... seems the least bad (but if there's another way that's better, then I'm all for it) 15:27:02 It's just not good to wait forever for something better to appear when we have bugs that need to be fixed now. 15:27:10 the other option i can think of is 'figureing out state ourselves' 15:27:44 agreed, but most of the stalled PRs are because they break something else, i was researching alternatives .. this week i finally realized dbus was dead end 15:28:38 It's definitely a house of cards. 15:28:45 If we could figure out state in a way that was less fragile... but given the flexibility of init systems in general, I'd think that is more fragile 15:29:02 Account for another tool's inconsistencies in Ansible can be a slippery slope. 15:29:07 *in 15:29:10 *g 15:29:14 Geeze, can't type today. 15:29:53 * bcoca points at docker.py .... 15:30:16 Oh man, that's discouraging. I'm working on podman/buildah modules... 15:30:21 he 15:30:38 get ready for many 'version >= ' comparissons 15:34:29 I know in the past we have elected not to "code around" certain systemd behaviors in Ansible, electing to report a "failure" because that is what systemd reported. 15:35:14 For our case, I think we can use the command module as a workaround (which feels similar to me to building the state reconciliation bcoca’s talking about, but in particular rather than in general) 15:35:18 that is our 'current state' .. the problem is that people are submitting fixes for 'their' cases and most of those break assumptions and other cases we've already 'worked around' 15:35:59 sethp-nr: almost tempted to ask everyone for their shell/command workarounds as that will be a nice db with 'many cases' 15:36:24 So we just need to make a decision to break from that historical stance, if that is indeed the case. 15:36:43 It does stink having to "fall back" to using command/shell rather than the module... 15:36:55 im torn between both solutions, neither seems good enough to me 15:38:10 we have a +1 for the systemctl track, unsure how others lean 15:41:13 sdoran: there might be hope, I'm reasonably sure a lot of podman/libpod, buildah, and skopeo were built to address shortcomings of docker and are developed by past docker developers with lessons learned in mind 15:42:38 @webknjaz's PR was creating a state machine, but mostly relied on systemd output, i think a similar track but 'not trusting' systemd and veryfing state in other ways might be viable alternative to the 'version db and comparisson' 15:42:54 "Trust, but verify" ;) 15:43:16 both still require a 'state mapping' from systemd states to something our users will understand 'started/stopped/enabled/disabled' 15:44:00 and we need to give 'good messages' likee, 'could not disable cause other service depends on this and would require disabling that service first' 15:44:15 and that requries a lot of logic we dont currently have 15:47:12 hmm so no one else has opionons on this? 15:47:42 thoughts/ideas? 15:49:01 I think we've come to the point where Ansible needs to account for systemd inconsistencies to create a consistent user experience and prevent having to fall back to command/shell. 15:49:18 how about this, let it sink in and we can decide course of action on tuesday? since today we really only have 1 vote and i would really like more before reaching a decision 15:49:25 I am concerned about where that will lead, though. Slippery slope of crazy edge case code. 15:49:42 sdoran: we are choosing between a cliff and a very high drop 15:49:46 Yup. 15:49:59 so im not concerned about it, im resigned to it 15:50:06 I have not architected enough things like this to see if this is going down a Bad Road or not. 15:50:12 Would like others to weigh in as well. 15:50:34 he, they are both 'bad roads' .. ive done similar things many times .. and im still unsure which road is 'worst' 15:50:42 But the alternative becomes "Oh, you have to use shell/command when dealing with systemd 60% of the time". 15:50:51 that is 'current state' .. sadly 15:50:57 and that is clearly 'worst of all roads' 15:50:59 Or "Ansible really stinks at managing system services". 15:51:28 The positive outcome could be "SystemD is a pain, but Ansible make is way easier to deal with" :) 15:51:43 sounds about right 15:51:44 i've been trying to find any other alternative and im out of options, but REALLY open to anything viable 15:51:59 that is my hope, i just dont see a good road to that 15:55:05 That just means to accept the pain and complexity ourselves for the sake of making Ansible better, and making it better for users. 15:55:22 I'm ok with that so long as it is tenable. 15:55:35 exactly, just unsure of which one of the chioces is the most tenable 15:55:46 if either are tenable .. past experience says 'not really' 15:55:47 I have never written or maintained a state machine. 15:56:17 Would it be easier to buy the SystemD folks a beer and have a good long talk? :) 15:57:20 my past interactions when pointing out the inconsistencies are mostly 'deal with it' 15:57:30 'this is how it is now' 15:57:41 Ok. Just thought I'd ask. 15:57:58 I mean, to be fair ... we've done similar things over time ... there's certain playbooks that worked in 2.0 that don't work now in 2.6 15:58:08 things change 15:58:15 but i welcome anyone to try, the issue is not as much the future as 'existing' inconstencies .. which they cannot do much about at this point 15:59:01 maxamillion: yes and we have deprecations and many reasons for the same, my issue is that many of these change are also inconsistent, in some cases RC signifies erros, in other cases it does not 15:59:17 in some cases you NEEd to parse the output, in other cases that output is not reliable 15:59:37 some changes have been for the better, but many have n ot 16:00:30 and the whole issue with the debian implementation, is clearly not systemd folk's fault .. that is squarly on debian implementers 16:00:42 its jsut the 'general state of installed systemd' is a mess 16:00:57 for those trying to 'predictably and reliably' trying to get info from it 16:02:53 im not even trying to assign blame, just dealing with the existing situation, i think systemd folk can make it better going forwards, but have no influence on 'existing problem' .. cause they cannot go to every server and force an update to they 'current systemd version' 16:03:14 ^ i should not give them that idea .. last thing we need is sytemd dealing with system upgrades internally 16:03:49 sorry, i got ranty .. but this is a sore subject for me ... 16:04:53 and i've been researching alternatives this last month to hit every dead end 16:06:12 bcoca: fair 16:07:07 so if we cannot get enough votes next meeting, i'll decide what path to go and just trod onto it .. meanwhile i'll pray to FSM for a miracle 16:07:24 #endmeeting