#openlmi log

14:00:45 <sgallagh> #startmeeting OpenLMI (2013-11-11)
14:00:45 <zodbot> Meeting started Mon Nov 11 14:00:45 2013 UTC.  The chair is sgallagh. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:45 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic.
14:00:49 <sgallagh> #meetingname OpenLMI Public IRC Meeting
14:00:49 <zodbot> The meeting name has been set to 'openlmi_public_irc_meeting'
14:00:55 <sgallagh> #chair sgallagh tsmetana jsafrane rdoty
14:00:55 <zodbot> Current chairs: jsafrane rdoty sgallagh tsmetana
14:01:01 <sgallagh> #info Meetings are recorded and will be posted on www.openlmi.org. Opinions expressed do not necessarily reflect the reviews of the participant's employer.
14:01:11 <sgallagh> #topic Roll Call
14:01:24 <sgallagh> Who do we have here today?
14:01:38 <jsafrane> Jan Safranek\
14:02:47 <praveen_pk> Praveen Paladugu
14:03:12 <kkaempf> Klaus Kämpf
14:03:25 <rnovacek> Radek Novacek
14:03:38 <tbzatek> Tomáš Bžatek
14:05:16 <sgallagh> That looks like most of our regulars, sans tsmetana and rdoty.
14:06:10 <sgallagh> If I recall correctly, we spent most of the previous meeting discussing access-control needs.
14:06:32 <sgallagh> Do we want to pick up from there, or does anyone have other items they'd like to put on the agenda first?
14:07:45 <sgallagh> Hmm, "auditing" was submitted as an agenda item for this week as well.
14:07:53 <sgallagh> Maybe that would be a good place to start.
14:08:00 <sgallagh> #topic Auditing Requirements
14:08:34 <sgallagh> So, first order of business: what are our auditing requirements? What granularity do we expect to need here
14:08:50 <sgallagh> stefw: If you're around, I recall you had a bit to say on this last week
14:09:54 <stefw> i would suggest talking with mitr about the auditing requirements
14:10:05 <stefw> he has a good handle on what is necessary
14:10:15 <sgallagh> Ok, I'll see if he's available to join us.
14:10:27 <sgallagh> Hmm, he doesn't seem to be online
14:10:40 <sgallagh> We'll wing it for now.
14:10:56 <stefw> the thing is, i have certain assumptions, but someone like mitr or sgrubb would be able to respond based on firmer requirements.
14:11:52 <stefw> in theory an architecture would need to be able to audit the user taking an action.
14:12:12 <stefw> however i don't know if auditing at the CIM input level is enough?
14:12:24 <stefw> that is audit the incoming requests before handing them off to the system
14:12:26 <sgallagh> What do you mean by the "input level"?
14:12:29 <sgallagh> ah, right.
14:12:30 <stefw> at the WBEM level
14:12:38 <stefw> it seems it would be better than nothing
14:12:43 <sgallagh> I was about to say that.
14:12:50 <sgallagh> At minimum, we should be able to record all of that.
14:12:51 <stefw> but would be prone to all sorts of oddities
14:13:07 <stefw> such as the system not actually taking the action described in the CIM message
14:13:29 <sgallagh> Well, CIM requires that an appropriate error be returned in that case.
14:13:38 <kkaempf> Very few user will interact with openlmi on the cim/wbem protocol level. Most will use a management application.
14:13:40 <sgallagh> Bugs can happen, but they're bugs.
14:14:01 <jsafrane> Pegasus has some sort of audit logging, https://collaboration.opengroup.org/pegasus/pp/uploads/40/14428/PEP258_AuditLogging.htm
14:14:07 <kkaempf> So we need to capture the user name at the management application and link this to the audit log at the cimom level.
14:14:09 <stefw> so if two users make the same rquest symultaneously, what happens?
14:14:10 <sgallagh> stefw: So auditing "this request was made" and "the system rejected it" should be sufficient (at that level)
14:14:17 <stefw> kkaempf, this is 'user' in a security context
14:14:44 <stefw> sgallagh, returning to the broader scale, this also makes the CIMOM a trusted part of teh audit system
14:14:49 <kkaempf> Its someone authenticated against the management application.
14:14:55 <stefw> there may be hard requirements about that for certain use cases
14:15:05 <sgallagh> I'm not sure that's necessarily true.
14:15:11 <sgallagh> The trusted part needs to be the kernel auditing
14:15:19 <sgallagh> I'd have to consult with mitr/sgrubb on that
14:15:24 <stefw> kkaempf, in the use case i'm examining openlmi for the management application uses the user's credentials to connect via CIM
14:15:34 <stefw> so the user is in fact accessing CIM, albeit through s oftware
14:15:40 * stefw notes that's always the case
14:16:22 <stefw> well the fact that software is in use :)
14:16:44 <stefw> sgallagh, yes, it's important to get real requirements here
14:16:59 <stefw> because auditing is pretty much driven by them
14:17:24 * sgallagh nods
14:17:27 <stefw> i do know that work on kdbus was partially driven by the desire to have dbus calls audited in the kernel
14:17:32 <kkaempf> The pegasus audit logging pep seems to be a good start
14:18:35 <sgallagh> Yes
14:18:40 * sgallagh reads up on it right now
14:18:43 <sgallagh> https://collaboration.opengroup.org/pegasus/pp/uploads/40/14428/PEP258_AuditLogging.htm
14:19:09 <sgallagh> stefw: This looks to be pretty much exactly the "input audit logging" we were talking about above.
14:19:20 <sgallagh> (Yay, our work here is done! :-P)
14:19:23 <stefw> right, so the question is if that is enough to satisfy requirements
14:19:30 <stefw> not whether it's useful (logging usually is)
14:20:01 <sgallagh> Certainly
14:20:20 <sgallagh> I think the answer we'd hear from sgrubb would be pretty much "there's no such thing as too much"
14:20:53 <rdoty> And Jack Rieden would note that people turn most of it off after the first week...
14:21:22 <kkaempf> Do we need to capture full requirements up front ?
14:21:34 <sgallagh> My suspicion is that if we take advantage of PolicyKit to do decision-making in the providers themselves (based on user identity) and log there, plus the kernel logging that will happen during any system call, that's probably going to paint a pretty complete picture.
14:22:18 <stefw> sgallagh, i agree
14:23:10 <stefw> sgallagh, the kernel logging is obviously less useful if we can't assume the loginuid
14:23:23 <stefw> but i don't think any of the major linux system management services do that
14:23:42 <kkaempf> sgallagh: are you proposing to add auditing/policy code to providers (rather than the cimom) ?
14:23:56 <sgallagh> kkaempf: not *exactly*
14:24:28 <stefw> hmm, i thought you were proposing that
14:24:33 <sgallagh> One of the things we discussed last week was to have the providers execute their requests in the context of the UNIX user that was performing them (either by forking a helper or using seteuid)
14:24:46 <sgallagh> And then invoking PolicyKit to authorize individual decisions
14:24:56 <sgallagh> Then the auditing would technically be done by PolicyKit.
14:25:13 <stefw> that would be a complete solution
14:25:19 <kkaempf> sgallagh: I'd rather see this in the cimom, not the provider.
14:25:35 <sgallagh> kkaempf: See which?
14:25:40 <jsafrane> sgallagh: that would be hard with current design of providers
14:26:04 <rdoty> kkaempf: how would you see that working?
14:26:10 <sgallagh> jsafrane: What's the limitation I'm missing?
14:26:34 <jsafrane> sgallagh: we don't fork stuff, we just do it :)
14:27:19 <sgallagh> jsafrane: Right, but that's not overly-difficult to change. And the alternative I mentioned would be to set the effective user ID (which doesn't require forking, but DOES necessitate locking)
14:27:40 <kkaempf> rdoty: For sfcb, that would be some code in providerDrv.c. I don't know about Pegasus.
14:28:01 <sgallagh> kkaempf: So you'd audit the request going into the provider?
14:28:50 <sgallagh> I think what we're trying to say is that it's more complete to log whenever a privileged action is actually occurring on the system (i.e. "Can I format this partition?")
14:29:02 <sgallagh> Rather than logging "The user has requested to format this partition"
14:29:25 <sgallagh> s/"Can I format this partition?"/"PolicyKit granted the user permission to format a partition"/
14:29:32 <sgallagh> (That's a more clear distinction)
14:29:34 <kkaempf> sgallagh: That's an awful lot of code in the providers and puts a lot of reponsibility on provider developers.
14:30:14 <kkaempf> Plus 'format partition' is the action to be logged, not 'called mkfs'.
14:30:43 <sgallagh> kkaempf: I'm not sure I see the distinction there
14:31:21 <jsafrane> sgallagh: with some providers, it could be easy to have policy/audit check in the provider  - we just forward messages using dbus to systemd or network manager; but storage and/or software would be error prone
14:31:22 <kkaempf> Auditing at the cimom level is guaranteed to capture all provider calls. Auditing at the provider level needs properly coded providers.
14:32:10 <sgallagh> kkaempf: Sure, understood. I've never once said we *shouldn't* audit at the CIMOM level.
14:32:22 <kkaempf> :-)
14:32:23 <sgallagh> In fact, that's already covered with the Pegasus audit logging you mentioned.
14:32:36 <tbzatek> ...and you could audit non-OpenLMI providers as well
14:32:43 <sgallagh> We're talking about what we should do in addition
14:33:18 <kkaempf> I can't imagine requirements on top of that currently.
14:33:45 <sgallagh> kkaempf: Yes, but you're also planning to use primarily the read-only functions of OpenLMI if I remember our earlier discussions.
14:33:54 <sgallagh> monitoring and querying.
14:34:21 <sgallagh> At that level, certainly the CIMOM auditing is going to be enough, since there should not be any potential damage
14:37:14 <sgallagh> Does anyone have anything to add on the topic of auditing, or shall I take an action item to drag mitr/sgrubb into our meeting next week to discuss it further?
14:37:51 <sgallagh> I think we can at least all agree that we should be using the Pegasus audit logging, yes?
14:37:56 <rdoty> Can we invite someone from the OpenPegasus team to join the meeting?
14:38:07 <sgallagh> rdoty: Of course
14:38:33 <sgallagh> Several of them should be aware of this meeting, it was announced to the public list
14:39:06 <rdoty> It doesn't look like any of them are here; time for a specific invitation and telling them what the subject is.
14:39:41 <stefw> sgallagh, in general, the tihng the provider is calling already does the access/policykit/auditing logic
14:39:56 <stefw> if we manage to calli t in the right loginuid/setuid context, then it'll just work
14:40:01 <stefw> but not sure how easy that is
14:40:08 <stefw> but that's why your solution above sounded so appealing
14:40:21 <stefw> it would work,even without significant auditing changes and polkit calls in the providers
14:40:32 <stefw> it just means running the providers in the right security context
14:40:40 * sgallagh nods
14:40:55 <sgallagh> Ok, so we should probably investigate how to accomplish that
14:42:20 <sgallagh> stefw: Do you have some thoughts on where to start?
14:42:34 <sgallagh> side-note:
14:42:49 <sgallagh> #agreed CIMOM-level audit logging is important. We should enable the built-in Pegasus audit log.
14:44:19 <stefw> sgallagh, like wihch provider to start with?
14:44:29 <stefw> or how to separate stuff into different security contexts?
14:44:34 <sgallagh> stefw: the latter
14:44:40 <sgallagh> And how to get it right
14:45:00 <stefw> i guess for me unfamiliar with the CIMOM internals, the question i would ask is: can we run providers in a child process of the cimom?
14:45:07 <stefw> for a given user logged in
14:45:14 <stefw> would fork a child process and assume appropriate security context
14:45:18 <stefw> and then run requests through there.
14:45:26 <sgallagh> stefw: We *can*, but do not currently.
14:45:27 <stefw> obviously this is HTTP requests
14:45:37 <sgallagh> Right now they are child processes running in the CIMOM's context
14:45:45 <stefw> so mapping a login session start/end isn't as obvious as it is with other transports
14:46:07 <rdoty> kkaempf: are you familiar with the Microsoft WMI security model? How do they handle it?
14:46:12 <kkaempf> stefw: sfcb runs every provider as a separate process
14:46:13 <sgallagh> jsafrane: Please fact-check me if I'm mistaken here
14:46:21 <kkaempf> rdoty: No, I'm not.
14:46:24 <sgallagh> kkaempf: As does Pegasus
14:46:47 <sgallagh> kkaempf: But the question is which session do they run in. In our current model, they run as part of the CIMOM's session/cgroup
14:47:02 <jsafrane> Pegasus can run a providers with UID of the logged-in user, I don't know what happens if two users use the same provider - does it run twice then?
14:47:04 <sgallagh> I think what stefw is looking for is for this fork to become a new user session
14:47:08 <kkaempf> stefw: But its per-provider, not per-user. So if user A issues a cim request and a provider gets loaded due to that request, it would run as user A.
14:47:46 <kkaempf> stefw: if user B issues another request for the same provider, the provider would need to switch its security context.
14:48:08 <sgallagh> right
14:48:33 <kkaempf> sgallagh: sfcb providers run in the cimom's session/cgroup.
14:48:46 <sgallagh> We need to figure out if Pegasus would load a second copy as the other user, take over the first copy and change its session, or (bad) run as the first user.
14:49:15 <stefw> sgallagh, i imagine forking a provider per user would be more appropriate
14:49:20 <stefw> note that this happens after authentication
14:49:33 <sgallagh> stefw: Yes, but that doesn't mean it's what currently happens :)
14:49:45 <stefw> and the use case is to have N users be single digits
14:49:50 <kkaempf> stefw: providers are not supposed to run concurrently afaik.
14:49:59 <stefw> only one provider runs at a time?
14:50:06 <stefw> do they exit after each request?
14:50:08 <sgallagh> vcrhonek: If you're around, as our resident Pegasus maintainer, could you look into this for next week?
14:50:12 <stefw> that seems pretty intense :)
14:50:19 <kkaempf> stefw: one copy of each provider runs at any time.
14:50:25 <sgallagh> stefw: They don't exit immediately
14:50:31 <kkaempf> sfcb implements a LRU sheme
14:50:37 <stefw> ah, okay, makes sense
14:50:38 <sgallagh> They stay alive for a few (configurable) minutes IIRC
14:51:11 <kkaempf> providers can also indicate 'don't unload' and can stay loaded/running as long as the cimom runs.
14:51:25 <stefw> so only curretnly the assumption that only one copy of a provider is running on a single kernel?
14:51:25 <sgallagh> tbzatek: Or if you want to look into it, that's cool too :)
14:51:29 <stefw> or it it more of a CIMOM API limitation
14:52:43 <kkaempf> stefw: a provider instruments a resource, you only want one copy to be running at any time.
14:52:57 <rdoty> Is there anything today that keeps two people from running fdisk at the same time from the command line?
14:52:58 <jsafrane> stefw: I need to check with Pegasus what it really does, sfcb has only one process per provider
14:53:06 <kkaempf> otherwise you'll get into all kinds of synchronization problems.
14:53:42 <sgallagh> #action jsafrane to dig into Pegasus and see how it handles per-user provider invocation
14:53:43 <kkaempf> rdoty: providers can spawn long running tasks, this is supported in cmpi.
14:53:44 <tbzatek> sgallagh: umm, integrating PK into Pegasus? I'm not aware of any its internals at all...
14:53:45 <stefw> well if the provider is not syncing with the state of the system, we'll have more problems.
14:53:52 <sgallagh> tbzatek: Not what I was talking about
14:54:08 <sgallagh> I was talking about figuring out how Pegasus handles providers running as a user, but jsafrane just volunteered
14:54:14 <tbzatek> ah
14:54:40 <stefw> if a provider is implemented in such a way that it thinks it owns the managed resource, then that's a pretty brittle provider, and won't work well for Linux management, where you have many ways to manage a resource (for better or worse).
14:55:20 <stefw> whereas if you have a provider implemented in such a way that it represents the state of the system, then it's not really a big deal to have two of them running, as they'll syncronize eventually/naturally.
14:55:22 <sgallagh> stefw: Well, it's often fair to assume ownership while operating on it at least
14:55:27 <sgallagh> (aka locking)
14:55:28 <stefw> sgallagh, true
14:55:38 <stefw> but that's expected with multiple users anyway
14:56:07 <stefw> but that locking generally should happen in a way that other management methods also respect the lock.
14:56:15 <stefw> ie: at a lower level than the provider itself
14:56:59 <stefw> and it's totally expected that a second user performing an exclusive action on a resource already involved in such an action will get an error message.
14:57:37 <sgallagh> stefw: It would be nice if every subsystem understood that.
14:58:57 <jsafrane> stefw: that's impossible, anybody can format a disk or edit network-scripts... running any of our providers twice with different UIDs should be ok, but nobody has ever tested that
15:00:17 <stefw> jsafrane, yeah it is more of a goal, rather than a hard requirement
15:00:37 <stefw> but providers should be somewhat robust in such cases
15:00:45 <stefw> and not segfault or deadlock if at all possible
15:00:59 <stefw> and i imagine they already are
15:01:59 * sgallagh notes we're over time.
15:02:23 <sgallagh> Shall we pick this up next week, hopefully with Jan providing us with details on Pegasus?
15:02:28 * stefw nods
15:03:52 <jsafrane> ok
15:04:38 <sgallagh> #action Resume discussion of process separation next week
15:04:47 <sgallagh> Thank you for participating, everyone!
15:04:50 <sgallagh> #endmeeting