The registration is a pseudo filesystem (proc, since PID tree already
exists) write of a u8[16] UUID representing the container ID to a file
representing a process that will become the first process in a new
container. This write might place restrictions on mount namespaces
required to define a container, or at least careful checking of
namespaces in the kernel to verify permissions of the orchestrator
so it
can't change its own container ID. A bind mount of nsfs may be
necessary in the container orchestrator's mntNS.
Note: Use a 128-bit scalar rather than a string to make compares faster
and simpler.
Require a new CAP_CONTAINER_ADMIN to be able to carry out the
registration.
Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.
No, because then any process with that capability (vsftpd) could change
its own container ID. This is discussed more in other parts of the
thread...
Not if we make the container ID append-only (to support nesting), or
write-once (the other idea thrown around). In that case, you can't move
"out" from a particular container ID, you can only go "deeper". These
semantics don't make sense for generic containers, but since the point
of this facility is *specifically* for audit I imagine that not being
able to move a process from a sub-container's ID is a benefit.
[This assumes it's CAP_AUDIT_CONTROL which is what we are discussing in
a sister thread.]
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/