On Thu, May 30, 2019 at 1:09 PM Serge E. Hallyn <serge@xxxxxxxxxx> wrote: > On Wed, May 29, 2019 at 06:39:48PM -0400, Paul Moore wrote: > > On Wed, May 29, 2019 at 6:28 PM Tycho Andersen <tycho@xxxxxxxx> wrote: > > > On Wed, May 29, 2019 at 12:03:58PM -0400, Paul Moore wrote: > > > > On Wed, May 29, 2019 at 11:34 AM Tycho Andersen <tycho@xxxxxxxx> wrote: > > > > > On Wed, May 29, 2019 at 11:29:05AM -0400, Paul Moore wrote: > > > > > > On Wed, May 29, 2019 at 10:57 AM Tycho Andersen <tycho@xxxxxxxx> wrote: > > > > > > > On Mon, Apr 08, 2019 at 11:39:09PM -0400, Richard Guy Briggs wrote: ... > > > > > > The current thinking > > > > > > is that you would only change the audit container ID from one > > > > > > set/inherited value to another if you were nesting containers, in > > > > > > which case the nested container orchestrator would need to be granted > > > > > > CAP_AUDIT_CONTROL (which everyone to date seems to agree is a workable > > > > > > compromise). > > > > > > won't work in user namespaced containers, because they will never be > > > capable(CAP_AUDIT_CONTROL); so I don't think this will work for > > > nesting as is. But maybe nobody cares :) > > > > That's fun :) > > > > To be honest, I've never been a big fan of supporting nested > > containers from an audit perspective, so I'm not really too upset > > about this. The k8s/cri-o folks seem okay with this, or at least I > > haven't heard any objections; lxc folks, what do you have to say? > > I actually thought the answer to this (when last I looked, "some time" ago) > was that userspace should track an audit message saying "task X in > container Y is changing its auditid to Z", and then decide to also track Z. > This should be doable, but a lot of extra work in userspace. > > Per-userns containerids would also work. So task X1 is in containerid > 1 on the host and creates a new task Y in new userns; it continues to > be reported in init_user_ns as containerid 1 forever; but in its own > userns it can request to be known as some other containerid. Audit > socks would be per-userns, allowing root in a container to watch for > audit events in its own (and descendent) namespaces. > > But again I'm sure we've gone over all this in the last few years. > > I suppose we can look at this as a "first step", and talk about > making it user-ns-nestable later. But agreed it's not useful in a > lot of situations as is. [REMINDER: It is an "*audit* container ID" and not a general "container ID" ;) Smiley aside, I'm not kidding about that part.] I'm not interested in supporting/merging something that isn't useful; if this doesn't work for your use case then we need to figure out what would work. It sounds like nested containers are much more common in the lxc world, can you elaborate a bit more on this? As far as the possible solutions you mention above, I'm not sure I like the per-userns audit container IDs, I'd much rather just emit the necessary tracking information via the audit record stream and let the log analysis tools figure it out. However, the bigger question is how to limit (re)setting the audit container ID when you are in a non-init userns. For reasons already mentioned, using capable() is a non starter for everything but the initial userns, and using ns_capable() is equally poor as it essentially allows any userns the ability to munge it's audit container ID (obviously not good). It appears we need a different method for controlling access to the audit container ID. Punting this to a LSM hook is an obvious thing to do, and something we might want to do anyway, but currently audit doesn't rely on the LSM for proper/safe operation and I'm not sure I want to change that now. The next obvious thing is to create some sort of access control knob in audit itself. Perhaps an auditctl operation that would allow the administrator to specify which containers, via their corresponding audit container IDs, are allowed to change their audit container ID? The permission granting would need to be done in the init userns, but it would allow containers with a non-init userns the ability to change their audit container ID. We would probably still want a ns_capable(CAP_AUDIT_CONTROL) restriction in this case. Does anyone else have any other ideas? -- paul moore www.paul-moore.com