On Thu, May 30, 2019 at 03:29:32PM -0400, Paul Moore wrote: > > [REMINDER: It is an "*audit* container ID" and not a general > "container ID" ;) Smiley aside, I'm not kidding about that part.] This sort of seems like a distinction without a difference; presumably audit is going to want to differentiate between everything that people in userspace call a container. So you'll have to support all this insanity anyway, even if it's "not a container ID". > I'm not interested in supporting/merging something that isn't useful; > if this doesn't work for your use case then we need to figure out what > would work. It sounds like nested containers are much more common in > the lxc world, can you elaborate a bit more on this? > > As far as the possible solutions you mention above, I'm not sure I > like the per-userns audit container IDs, I'd much rather just emit the > necessary tracking information via the audit record stream and let the > log analysis tools figure it out. However, the bigger question is how > to limit (re)setting the audit container ID when you are in a non-init > userns. For reasons already mentioned, using capable() is a non > starter for everything but the initial userns, and using ns_capable() > is equally poor as it essentially allows any userns the ability to > munge it's audit container ID (obviously not good). It appears we > need a different method for controlling access to the audit container > ID. One option would be to make it a string, and have it be append only. That should be safe with no checks. I know there was a long thread about what type to make this thing. I think you could accomplish the append-only-ness with a u64 if you had some rule about only allowing setting lower order bits than those that are already set. With 4 bits for simplicity: 1100 # initial container id 1100 -> 1011 # not allowed 1100 -> 1101 # allowed, but now 1101 is set in stone since there are # no lower order bits left There are probably fancier ways to do it if you actually understand math :) Since userns nesting is limited to 32 levels (right now, IIRC), and you have 64 bits, this might be reasonable. You could just teach container engines to use the first say N bits for themselves, with a 1 bit for the barrier at the end. Tycho