On Mon, 2017-05-22 at 17:22 +0100, David Howells wrote: > Here are a set of patches to define a container object for the kernel and > to provide some methods to create and manipulate them. > > The reason I think this is necessary is that the kernel has no idea how to > direct upcalls to what userspace considers to be a container - current > Linux practice appears to make a "container" just an arbitrarily chosen > junction of namespaces, control groups and files, which may be changed > individually within the "container". > > The kernel upcall mechanism then needs to decide which set of namespaces, > etc., it must exec the appropriate upcall program. Examples of this > include: > > (1) The DNS resolver. The DNS cache in the kernel should probably be > per-network namespace, but in userspace the program, its libraries and > its config data are associated with a mount tree and a user namespace > and it gets run in a particular pid namespace. > > (2) NFS ID mapper. The NFS ID mapping cache should also probably be > per-network namespace. > > (3) nfsdcltrack. A way for NFSD to access stable storage for tracking > of persistent state. Again, network-namespace dependent, but also > perhaps mount-namespace dependent. > > (4) General request-key upcalls. Not particularly namespace dependent, > apart from keyrings being somewhat governed by the user namespace and > the upcall being configured by the mount namespace. > > These patches are built on top of the mount context patchset so that > namespaces can be properly propagated over submounts/automounts. > > These patches implement a container object that holds the following things: > > (1) Namespaces. > > (2) A root directory. > > (3) A set of processes, including a designated 'init' process. > > (4) The creator's credentials, including ownership. > > (5) A place to hang security for the container, allowing policies to be > set per-container. > > I also want to add: > > (6) Control groups. > > (7) A per-container keyring that can be added to from outside of the > container, even once the container is live, for the provision of > filesystem authentication/encryption keys in advance of the container > being started. It's hard to decide which of these has higher priority, I think both essential to a container implementation. Ian