Quoting Aditya Kali (adityakali@xxxxxxxxxx): > On Thu, Apr 14, 2016 at 8:27 AM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote: > > Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx): > >> "Serge E. Hallyn" <serge@xxxxxxxxxx> writes: > >> > >> > This is so that userspace can distinguish a mount made in a cgroup > >> > namespace from a bind mount from a cgroup subdirectory. > >> > >> To do that do you need to print the path, or is an extra option that > >> reveals nothing except that it was a cgroup mount sufficient? > >> > >> Is there any practical difference between a mount in a namespace and a > >> bind mount? > >> > >> Given the way the conversation has been going I think it would be good > >> to see the answers to these questions. Perhaps I missed it but I > >> haven't seen the answers to those questions. > > > > Yup, I tried to answer those in my last email, let me try again. > > > > Let's say I start a container using cgroup namespaces, /lxc/x1. It mounts > > freezer at /sys/fs/cgroup so it has field three of mountinfo as /lxc/x1, > > and /sys/fs/cgroup/ is the path to the container's cgroup (/lxc/x1). In > > that container, I start another container x1, not using cgroup namespaces. > > It also wants a cgroup mount, and a common way to handle that (to prevent > > container rewriting its limits) is to mount a tmpfs at /sys/fs/cgroup, > > create /sysfs/cgroup/lxc/x1, and bind mount /sys/fs/cgroup/lxc/x1 from > > the parent container onto /sys/fs/cgroup/lxc/x1 in the child container. > > Now for that bind mount, the mountinfo field 3 will show /lxc/x1/lxc/x1, > > with mount target /sys/fs/cgroup/lxc/x1, while /proc/self/cgroup for a task > > in that container will show '/lxc/x1'. Unless it has been moved into > > /lxc/x1/lxc/x1 in the container (/lxc/x1/lxc/x1/lxc/x1 on the host)... > > Every time I've thought "maybe we can just..." I've found a case where it > > wouldn't work. > > > > At first in lxc we simply said if /proc/self/ns/cgroup exists assume that > > the cgroupfs mounts are not bind mounts. However, old userspace (and > > container drivers) on new kernels is certainly possible, especially an > > older distro in a container on a newer distro on the host. That completely > > breaks with this approach. > > > > My main concern regarding making this a new kernel API is that its too > generic and exposes information about all system cgroups to every > process on the system, not just the container or the process inside it > that needs it. Not all containers need this information and not all > processes running inside the container needs this. I haven't spent too > much thought into it, but it seems you will still need to update the > container userspace to read this extra mount option. So seems like a > simpler approach where the host "cgroup manager" provides this > information to specific container cgroup manager via other user-space > channels (a config file, command-line args, environment vars, proper > container mounts, etc.) may also work, right? No, because existing legacy userspace would need to be taught about these new channels. I'm testing a new patch which simply fixes the root dentry field in mountinfo, which should also serve to fix this problem without adding the nsroot= option field. -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html