Quoting Aditya Kali (adityakali@xxxxxxxxxx): > On Thu, Jul 24, 2014 at 10:01 AM, Serge Hallyn <serge.hallyn@xxxxxxxxxx> wrote: > > Quoting Aditya Kali (adityakali@xxxxxxxxxx): > >> CLONE_NEWCGROUP will be used to create new cgroup namespace. > >> > > > > This is fine and I'm not looking to bikeshed, but am wondering - did > > you consider any other ways beside unshare (i.e. a new mount option > > to cgroupfs)? If so, do you have a list of the downsides of those? > > (I mainly ask bc clone flags are still a scarce commodity) > > > > I did consider couple of other ways: > > (1) having a cgroup.ns_root (or something) cgroup file. If this value > is '1', it would mean that all processes it and its descendant cgroups > will have their cgroup paths in /proc/self/cgroup terminated at this > cgroup. > For ex: > [A] --> [B] --> C > | --> [D] --> E > > [A], [B] and [D] has cgroup.ns_root = 1. > * all processes in cgroup C & E will see their cgroup path as /C and > /E respectively > * all processes in cgroup B & D will see their own cgroup path as / > > In this model, its easy to know what to show if process is looking at > its own cgroup paths (/proc/self/cgroup). It gets tricky when you are > looking at other process's /proc/<pid>/cgroup. We may be able to come > up with some hacky way read correct value, but depending on the > cgroupfs mount, it may not make sense. > One other major drawback of this approach is that "every" process in > the cgroup will now get a restricted view. i.e., you cannot change > cgroups without affecting your view. And this is undesirable for > administrative processes. > > (2) Another idea that I didn't pursue further (and is a bit hacky as > above) was having cgroup.ns_procs (like cgroup.procs, but all the pids > in cgroup.ns_procs will have their /proc/self/cgroup restricted). > Writing a pid to cgroup.ns_procs implies that you are writing it to > cgroup.procs too. But, not vise-versa. So, you could move yourself in > another cgroup by writing your pid in cgroup.procs, but not in > cgroup.ns_procs, thus preventing from getting "rooted". I This was to > solve administrative process issue in the above appraoch. But I think > this is very clunky too and I find semantics for this approach to be > non-intuitive. It almost looks like moving towards a separate "ns" > subsystem. But as we already know, its a path to failure. > > I didn't think of using a mount option. I imagine the mount option > (something like -o root=/bathjobs/container_1) could be used to > restrict the visibility of cgroupfs inside the container's mount > namespace. i.e., the value you read from /proc/<pid>/cgroup now > depends on what mount namespace you are in. Its similar to cgroup > namespace, but just that the cgroupns_root is now stored in the > 'struct mnt_namespace' instead of a separate 'struct > cgroup_namespace'. But, since mount namespace on creation inherits > mounts from its parent, the first cgroupfs mount in a mount namespace > is now treated specially. Also, its not possible to restrict cgroups > without mount namespace now. This is interesting and may not be too > bad. I am willing to give this a try. But I feel the cgroup namespace > approach fits well in-line with other namespaces where it does one > thing - virtualize the view of /proc/<pid>/cgroup file for processes > inside the namespace. The semantics are more intuitive as they are > similar to other namespaces. Yeah, let's stick with what you have :) thanks, -serge -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html