14:30:03 #startmeeting rolekit (2015-11-10) 14:30:03 Meeting started Tue Nov 10 14:30:03 2015 UTC. The chair is sgallagh. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:30:03 Useful Commands: #action #agreed #halp #info #idea #link #topic. 14:30:03 #meetingname rolekitweekly 14:30:03 The meeting name has been set to 'rolekitweekly' 14:30:03 #chair sgallagh twoerner nilsph 14:30:03 Current chairs: nilsph sgallagh twoerner 14:30:04 #topic init process 14:30:10 .hello nphilipp 14:30:11 nilsph: nphilipp 'Nils Philippsen' 14:30:12 Hello, folks. Who do we have today? 14:30:14 .hello twoerner 14:30:15 twoerner: twoerner 'Thomas Woerner' 14:30:18 .hello sgallagh 14:30:20 sgallagh: sgallagh 'Stephen Gallagher' 14:30:25 Great, the gang's all here. 14:30:30 #topic Agenda 14:30:59 #info Agenda Item: Issue Triage 14:31:07 #info Agenda Item: Status Report 14:31:19 Any other topics? 14:31:49 nope 14:32:21 not from me either 14:32:37 ok 14:32:42 #topic Issue Triage 14:32:53 There are three issues that have been filed since the last time we did one of these. 14:33:29 Oh, and a pull request 14:33:59 #topic https://github.com/libre-server/rolekit/issues/54 - The Domain Controller role should support setting up the Vault 14:34:23 (All three of the new tickets are around the Domain Controller, BTW) 14:34:24 That looks easy enough to do. 14:34:47 Do we support differentiating between versions of underlying software? 14:35:05 Right, that's the tricky part. 14:35:10 I.e. can we optionally support that cmdline option on versions of freeipa? 14:35:19 (that do have it) 14:35:28 I think in this case, we can probably Ask Forgiveness Instead Of Permission. 14:35:38 What I thought 14:35:40 (Try to launch with --setup-kra and if that fails, retry without it) 14:35:58 Because ipa-server-install fails on unknown arguments 14:36:18 My take is: if the user specifies it, add it to the cmdline and if not, leave it out. Error out if it fails regardless. 14:36:38 Otherwise users will think they have the option when they haven't (because we caught the error and retried without it) 14:36:44 nilsph: Well, the approach we've taken with the DC so far is to install all options unless asked not to 14:36:51 Ahh. 14:36:57 That's an option, too :). 14:37:17 nilsph: I think it might be reasonable to fail if it was explicitly requested and was unavailable 14:37:26 Yeah, retrying without it seems good then. 14:37:34 But if it's just taking the defaults, go with whatever it supports. 14:37:51 I thougth we didn't provide options and just add all features? 14:38:04 nilsph: We provide the option, just default it to enabled. 14:38:33 For the DNS server, for example, it's "serve_dns" 14:39:11 Technically, the internal CA can be skipped as well, if you have a chained CA, but we don't support that right now. 14:39:20 I should probably open a ticket for that, but I don't want to rush to implement it. 14:39:25 So would it then be: add the option, default to "try-enable" it (you know what I mean), but choosing does "enable-or-fail". 14:39:44 nilsph: Yeah, I think that's what I was saying. 14:39:55 yes, that sounds reasonable 14:40:23 (And in the future if other options are added, we'll have to have a staged fall back, of course) 14:40:26 Not sure whether the defaults system supports that already, but I guess I can try. 14:41:05 nilsph: I kind of fudged it with the domain_name function. 14:41:32 Then I'll have to see if I can copy that. I like fudge. ;) 14:41:32 So "None" would be "try-enable" in this case. 14:41:45 /me nods 14:41:49 That doesn't even sound like fudging to me. 14:41:53 I take it you are volunteering to implement this? :) 14:42:11 I guess so. :) 14:42:28 Excellent. I'll put it on the F24 schedule, but not as a blocker. 14:43:26 /me adds a summary comment to the issue. Please hold. 14:44:34 *cue hold music* 14:45:00 OK, updated 14:45:03 (and assigned) 14:45:27 OK, next ticket 14:45:42 #topic https://github.com/libre-server/rolekit/issues/55 - Ensure static IP for domain controller 14:46:06 This came as part of the fallout from the F22->F23 upgrade. 14:46:25 Some bits in the networking stack changed under the hood and my DHCP-assigned static address changed. 14:46:48 In reality, we probably don't ever want FreeIPA to rely on a DHCP address, since it is so closely tied to DNS services. 14:47:10 My proposal is that it would be useful to have us tell NetworkManager to make the configuration static. 14:47:37 That said, I'm not sure if it's feasible to do this automatically (since we'd be attempting to claim an IP from within the configured DHCP range) 14:48:11 So I think we should think about this carefully. There's no rush, because the simple workaround would be the admin setting the IP manually using nmcli or Cockpit anyway. 14:48:38 Oh, I forgot to #info the previous ticket. One moment while I tinker with the logs. 14:48:40 #undo 14:48:40 Removing item from minutes: 14:48:45 There might be (Fort Knox-like) setups where machines have to call DHCP to get their assigned IP addresses, otherwise they'll get blocked at the switches or something. Not sure if we want to cater to these. 14:48:53 #info Assigned to nilsph for Fedora 24 nice-to-have enhancement 14:48:58 #topic https://github.com/libre-server/rolekit/issues/55 - Ensure static IP for domain controller 14:49:34 nilsph: Those systems are guaranteed to be broken by changes like the device naming one that we just hit. 14:49:41 I think if we want to add that, make it a (default enabled?) option. 14:49:45 It is basically how I had my VMs deployed, in fact 14:50:17 Yes. That's why we should leave the option of "trust me, I know what I'm doing" open to the admin ;). 14:51:22 And/or telling people that they need to take care of that the IP address doesn't change (if we don't pin it). 14:52:25 The more I think about this, the more I think we should just scrap the issue entirely and maybe just add some notes in the man page that the IP must be static for safe operation 14:52:33 And rely on them to make that happen however makes sense 14:52:40 +1 14:52:49 OK, I'll convert this to a doc bug and handle it myself 14:53:41 +1 for skipping this issue and adding a section to the man page of the domain conrtoller man page 14:54:31 making the IP static is nothing we should and could easily do 14:54:33 #info Making this a documentation bug, assigned to sgallagh for Fedora 24 14:54:58 #topic https://github.com/libre-server/rolekit/issues/57 - Support redeploy() for Domain Controller 14:55:10 I added this one just before the meeting, and it's non-trivial. 14:55:26 wow 14:55:42 There's no urgency on this, however. 14:55:50 wouldn't this be more like an undeploy and a fresh deploy? 14:55:51 * nilsph disappears into the woodwork 14:55:53 For now, it's here for tracking. 14:55:56 ;) 14:55:56 twoerner: No, actually 14:56:02 no? 14:56:14 twoerner: IPA has tools for doing this already on a live system 14:56:22 annd ing an removing components might be a bit too much for redeploy 14:56:23 sgallagh: you want to be able to switch options off and on without disrupting the rest of the instance, right? 14:56:51 twoerner: This is the exact example I used when defining the redeploy() option :) 14:56:58 ok.. if there is working support in IPA, then this is ok for me.. 14:57:17 nilsph: Well, short-term disruption is okay. 14:57:28 (Like, outage-window type disruption) 14:57:49 I meant, permanently, as in deleting the instance and starting over (which is redeploy AIUI) 14:58:19 the description in the issue is a bit short .. not mentioning that there is support for this in IPA with a tool .. :-) 14:58:36 nilsph: Don't use memcached as a good example of redeploy() 14:58:39 That *is* a hack 14:58:53 That particular example we only get away with because memcache has no state 14:59:29 twoerner: Yeah, like I said I was creating it just as the meeting started. 14:59:31 I was rushing :) 14:59:42 Time to explain "redeploy" to me, again, I guess :) 15:00:18 nilsph: redeploy() is meant to be an in-place modification of solution-level changes. 15:00:34 I hacked the decommission/redeploy for memcache because it was easier. 15:00:52 But this ticket is the exact feature I always meant for redeploy() to handle 15:01:01 Without the marketing buzzwords: "change options in a deployed instance"? 15:01:01 I just only now got around to adding it to the list. 15:01:14 nilsph: Well, options *of the instance*. 15:01:35 e.g. Add or remove optional components vs. add or remove users in the domain 15:01:49 Of course of the instance, we're not talking modifying the role. I'm not at least. 15:01:58 Understood. 15:02:03 nilsph: Right, I'm just trying to be extremely clear in the meeting logs :) 15:02:08 Good :) 15:02:43 In any case, this is going to be a large effort and probably not immediately urgent. 15:02:55 I think there are types of options which shouldn't be open to this, e.g. "database name" which is not an installation option of Postgres, really. 15:03:01 So I'm going to suggest we drop this in the "Future" milestone and revisit it later 15:03:07 +1 15:03:36 nilsph: Right, not all options have to be modifiable. 15:03:59 +1 15:04:19 I'm also going to do the same to any of the help-wanted items on the list 15:04:31 So we can easily just look at the "no milestone" set when figuring out what to triage. 15:04:43 +1 15:05:23 #info Domain Controller redeploy() is deferred to the Future milestone for now 15:06:47 ok, next topic 15:06:57 #topic Status Report 15:07:21 I went to the systemd conference last week. I had a chance to discuss a few topics with Lennart around how we are working with target units. 15:07:41 What's his take? 15:08:02 #link https://github.com/systemd/systemd/issues/1797 15:08:36 Lennart agreed with me that not propagating failures up to BindsTo units is a bug. 15:08:42 So we will be able to rely on that in the future. 15:09:40 What this means is that, once this is implemented, we'll be able to tell the difference between a target that is stopped because it was manually stopped or one that crashed. 15:09:47 Good. Even if we find that we don't need it in the end (pointing to our discussion of Monday last week ;)) 15:10:00 Well, this is still useful *information* 15:10:20 Yeah, having targets just for book-keeping is still worthwhile. 15:10:29 I think I've revised the way I want to handle this somewhat, and I'm working on implementing it. 15:10:46 I'm going to try to describe my new plan first, then I'll EOF and you can poke holes in it, please :) 15:11:14 ok, will do :-) 15:11:47 First, we'll stop recording the state as an attribute of the role XML. Instead, we will rely entirely on the ActiveState attribute of the role target unit to represent the RUNNING vs. READY-TO-START state of the system. 15:12:12 Whenever this attribute is requested, we will query systemd for it (with a reasonable cache in rolekit) 15:12:51 Second, we will modify the role target unit so that it will attempt to auto-restart unless it fails repeatedly. 15:13:30 If the target unit fails on *startup*, it does end up in the failure state and in this case we can propagate that into rolekit as being in the ERROR state. (Since obviously something is broken that needs attention) 15:14:29 By relying on the state of the target unit, we no longer need the OnFailure units that fire d-bus messages indicating error to the rolekit daemon, since our status will always be accurate on lookup. 15:15:09 This means we will be able to reduce the set of unit files we create down to one per role (rather than the target unit, failure unit and N signaling units, where N was every Requires: entry) 15:15:28 This reduction in complexity will make for easier maintenance of the code and easier cleanup of the system. 15:15:31 EOF 15:15:35 ok, this is working if the instance at least reached the running state.. before reaching it, there is no unit - we also need to track the transitional states and the other persistent states 15:15:59 The transitional states are actually representable by ActiveState values. 15:16:01 s/unit/target unit/ 15:16:13 Well, except deploying and decommissioning, of course 15:16:21 which can be tracked internally (I expect the rolekitd instance to stay around for the duration of deployment) 15:16:22 But those are active states and the rolekit daemon must be running 15:16:27 Right 15:16:51 +1 15:16:56 sounds good 15:17:04 roled is not suspended while an instance is in a transitional state 15:17:10 twoerner: Before reaching ready-to-start, it can only be one of "nascent", "deploying" or "error" 15:17:24 yes 15:17:42 twoerner: Right, as per a previous bug, if it starts up and we're in a transitional state, then it treats it as an error 15:17:52 coincides with how I think things should be working 15:17:56 /me nods 15:18:20 OK, I realize though I won't be able to remove the state from the XMl, if only because we still want to know if we're stuck in a transitional state during startup. 15:18:22 less persistent state == good 15:18:33 (that we have to keep care of) 15:18:33 yes 15:18:45 But we can always go to "ask systemd if we think it's supposed to be ready-to-start or running" 15:19:02 And therefore get the real value. 15:19:02 so.. if the instance is in READY_TO_START or RUNNING state, then we ask systemd 15:19:05 yes 15:19:24 (and if systemd comes back with a transitional state, that'll be an interesting situation :) ) 15:19:43 is that possible? 15:20:06 Yes, a race condition where someone called `systemctl stop target.unit` just as we were trying to read it. 15:20:20 ok.. yes 15:20:23 right 15:20:30 But in that case, I think we probably want to just report the reality and expect the client to wait a bit and retry. 15:20:52 Or register with the magical job system we're going to build :) 15:21:17 ok.. how about this: 15:21:36 "magical job system"? 15:21:37 remove READY_TO_START and RUNNING and add SYSTEMD instead? 15:21:54 sounds sensible to me 15:21:58 just to make sure that this is an expected and also valid state 15:22:00 Interesting... go on 15:22:30 SYSTEMD meaning "we might not know the state, but ask systemd and cache for a reasonable time" 15:22:31 nilsph: https://github.com/libre-server/rolekit/issues/18 15:22:35 and that systemd will be consulted to get the result 15:22:55 sgallagh: ahh 15:22:57 tnx 15:24:03 twoerner: I'm not sure I see the value. 15:24:04 SYSTEMD might be our internal state, that is then replaced by the value we roled is getting back from systemd 15:24:16 Ah 15:24:29 Yeah, I suppose that could be reasonable 15:24:35 logic vs interface 15:24:43 I have to think about that a bit. 15:25:11 (and we're running out of meeting) 15:25:36 #info Lots of discussion on the redesign of the unit file creation and monitoring. See full logs for details 15:25:46 but we need to think about the actions that can be done using the SYSTEMD state 15:25:55 nilsph, twoerner: With five minutes remaining, a fast update on your status? :) 15:26:01 redeploy, start/stop, .. 15:26:28 twoerner: I think SYSTEMD might just be a transitional state at startup 15:26:47 All our real operations should be done from ready-to-start or running, after we've established where we actually are 15:29:22 yes, but with READY_TO_START and RUNNING not being real states (we do not have a simple way to get notified of a stopped service that was required by a running instance) we can not distinguish betweethese states 15:30:04 twoerner: Sorry, I'm not following. What do you mean by "we do not have a simple way to get notified of a stopped service that was required by a running instance"? 15:30:15 therefore running will not get 'reduced' to ready-to-start or error 15:30:56 as soon as the instance has a target unit we need to rely on systemd completely to get the state 15:31:01 I think these ideas could to be split into "who's responsible for XYZ" and "what state is XYZ in", and rolekitd presents an amalgamate of both to the outside world -- if we're responsible, we present the state we know, otherwise ask systemd. 15:32:08 Or even, our state has a dual meaning, either "state is ..." or "ask systemd". I guess that's what twoerner was proposing. 15:32:34 I think "ask systemd" is not good.. we should provide this to the user 15:32:35 twoerner: Sure, but we can also register for notification when that unit changes state 15:32:40 s/we/rolekit/ 15:33:10 nilsph: I don't want us ever to be telling the user "ask systemd". We can do that on their behalf 15:33:24 twoerner: I meant that as "if someone asks us now, we ask systemd, translate that appropriately, and then respond with the state" 15:33:30 sgallagh: ^^ 15:33:39 yes 15:33:42 OK, that was unclear. 15:34:12 just that the meaning of our internal state would drift away from "the state of the instance" and becomes a hybrid 15:34:29 nothing should change to the outside in this scheme 15:34:49 yes, therefore I want to replace the states that are hybrid to simply be SYSTEMD 15:34:49 only less (or later) work for rolekitd 15:34:55 exactky 15:35:00 -k+l 15:35:04 to make sure that this is really only one state and not several 15:35:11 +11 15:35:40 The thing I'm getting at is that I'm not sure it makes sense to have SYSTEMD as a state. 15:35:50 and to make sure that if someone is looking at the XML file, there is somehting we can easily explain 15:36:03 If we're not nascent or error at startup, then we get it from systemd and use the known real state 15:36:17 I don't know that having a systemd transitional state makes sense 15:36:40 this is not a transitional in my opinion 15:36:47 twoerner: OK, I guess I can *kind of* see where it might be useful in the XML file 15:36:51 it is a persistent state 15:37:07 for rolekit... to make sure to ask systemd 15:37:18 in this state 15:37:20 sgallagh: the state could even be "None" unless we want that to express something else? 15:37:33 as in "we don't know ourselves" 15:37:34 nilsph: I'd prefer to avoid None at least 15:37:39 thought so 15:37:45 we reached the state, where systemd need to be consulted to get the state 15:38:11 I don't think that the state should be on disk for a deployed instance, in this scheme. 15:38:13 yes, please do not use "None" or "" 15:38:18 Yeah, I see the value in having it and keeping it in the XML 15:38:31 But never under any circumstances exposed to end-users 15:38:39 yes 15:38:41 OK, I think we're on the same page 15:38:44 this is for rolekit 15:39:07 While deploying or decommissioning, it needs to go somewhere (hint: probably not in /etc), but while it's deployed properly there's no need for it. 15:39:11 to make sure that we can have a simple way to detect this internal state 15:40:00 How that is represented in the running rolekitd is another matter, state==SYSTEMD or whatever. 15:40:21 I think there is a need for this to make sure that we really reached at least the 'ready-to-start' state 15:40:42 ok 15:40:48 and are prepared to ask systemd 15:40:52 I think I have enough to go on, here. 15:40:56 to get a meaningful reply 15:41:05 Presence of deployment/decommissioning state info -> not ready-to-start/SYSTEMD or whatever 15:41:20 I'll work on getting a patch out this week and we can iterate on it from there 15:41:24 cool 15:41:26 ok 15:41:28 good 15:44:06 Alright, I think we've been in this meeting long enough, and I have to run the Server SIG meeting in 15 minutes. 15:44:11 Thanks for coming, folks! 15:44:18 yes, thanks 15:44:27 #endmeeting