Quoting Andrew Morton (akpm@xxxxxxxxxxxxxxxxxxxx): > On Mon, 27 Sep 2010 12:14:10 +0200 > Daniel Lezcano <daniel.lezcano@xxxxxxx> wrote: > > > The ns_cgroup is a control group interacting with the namespaces. > > When a new namespace is created, a corresponding cgroup is > > automatically created too. The cgroup name is the pid of the process > > who did 'unshare' or the child of 'clone'. > > > > This cgroup is tied with the namespace because it prevents a > > process to escape the control group and use the post_clone callback, > > so the child cgroup inherits the values of the parent cgroup. > > > > Unfortunately, the more we use this cgroup and the more we are facing > > problems with it: > > > > (1) when a process unshares, the cgroup name may conflict with a previous > > cgroup with the same pid, so unshare or clone return -EEXIST > > > > (2) the cgroup creation is out of control because there may have an > > application creating several namespaces where the system will automatically > > create several cgroups in his back and let them on the cgroupfs (eg. a vrf > > based on the network namespace). > > > > (3) the mix of (1) and (2) force an administrator to regularly check and > > clean these cgroups. > > > > This patchset removes the ns_cgroup by adding a new flag to the cgroup > > and the cgroupfs mount option. It enables the copy of the parent cgroup > > when a child cgroup is created. We can then safely remove the ns_cgroup as > > this flag brings a compatibility. We have now to manually create and add the > > task to a cgroup, which is consistent with the cgroup framework. > > So this is a non-backward-compatible userspace-visible change? Yes, it is. Patch 1 is needed to let lxc and libvirt both control containers with same cgroup setup. Patch 3 however isn't *necessary* for that. Daniel, what do you think about holding off on patch 3? > What are the implications of this? The ns cgroup does 2 things which no other cgroup does: (1) it moves tasks into a child cgroup any time they unshare or clone a namespace. And (2) it prevents them from moving up to a parent cgroup. The latter in particular makes it the only way, without using an LSM, of locking root into a cgroup, until user namespaces are further developed (*). -serge (*) - Maybe something to add to that new kernel todo list _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers