Am 05.01.2015 um 23:48 schrieb Aditya Kali: > On Sun, Dec 14, 2014 at 3:05 PM, Richard Weinberger <richard@xxxxxx> wrote: >> Aditya, >> >> I gave your patch set a try but it does not work for me. >> Maybe you can bring some light into the issues I'm facing. >> Sadly I still had no time to dig into your code. >> >> Am 05.12.2014 um 02:55 schrieb Aditya Kali: >>> Signed-off-by: Aditya Kali <adityakali@xxxxxxxxxx> >>> --- >>> Documentation/cgroups/namespace.txt | 147 ++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 147 insertions(+) >>> create mode 100644 Documentation/cgroups/namespace.txt >>> >>> diff --git a/Documentation/cgroups/namespace.txt b/Documentation/cgroups/namespace.txt >>> new file mode 100644 >>> index 0000000..6480379 >>> --- /dev/null >>> +++ b/Documentation/cgroups/namespace.txt >>> @@ -0,0 +1,147 @@ >>> + CGroup Namespaces >>> + >>> +CGroup Namespace provides a mechanism to virtualize the view of the >>> +/proc/<pid>/cgroup file. The CLONE_NEWCGROUP clone-flag can be used with >>> +clone() and unshare() syscalls to create a new cgroup namespace. >>> +The process running inside the cgroup namespace will have its /proc/<pid>/cgroup >>> +output restricted to cgroupns-root. cgroupns-root is the cgroup of the process >>> +at the time of creation of the cgroup namespace. >>> + >>> +Prior to CGroup Namespace, the /proc/<pid>/cgroup file used to show complete >>> +path of the cgroup of a process. In a container setup (where a set of cgroups >>> +and namespaces are intended to isolate processes), the /proc/<pid>/cgroup file >>> +may leak potential system level information to the isolated processes. >>> + >>> +For Example: >>> + $ cat /proc/self/cgroup >>> + 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1 >>> + >>> +The path '/batchjobs/container_id1' can generally be considered as system-data >>> +and its desirable to not expose it to the isolated process. >>> + >>> +CGroup Namespaces can be used to restrict visibility of this path. >>> +For Example: >>> + # Before creating cgroup namespace >>> + $ ls -l /proc/self/ns/cgroup >>> + lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835] >>> + $ cat /proc/self/cgroup >>> + 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1 >>> + >>> + # unshare(CLONE_NEWCGROUP) and exec /bin/bash >>> + $ ~/unshare -c >>> + [ns]$ ls -l /proc/self/ns/cgroup >>> + lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183] >>> + # From within new cgroupns, process sees that its in the root cgroup >>> + [ns]$ cat /proc/self/cgroup >>> + 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/ >>> + >>> + # From global cgroupns: >>> + $ cat /proc/<pid>/cgroup >>> + 0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1 >>> + >>> + # Unshare cgroupns along with userns and mountns >>> + # Following calls unshare(CLONE_NEWCGROUP|CLONE_NEWUSER|CLONE_NEWNS), then >>> + # sets up uid/gid map and execs /bin/bash >>> + $ ~/unshare -c -u -m >> >> This command does not issue CLONE_NEWUSER, -U does. >> > I was using a custom unshare binary. But I will update the command > line to be similar to the one in util-linux. > >>> + # Originally, we were in /batchjobs/container_id1 cgroup. Mount our own cgroup >>> + # hierarchy. >>> + [ns]$ mount -t cgroup cgroup /tmp/cgroup >>> + [ns]$ ls -l /tmp/cgroup >>> + total 0 >>> + -r--r--r-- 1 root root 0 2014-10-13 09:32 cgroup.controllers >>> + -r--r--r-- 1 root root 0 2014-10-13 09:32 cgroup.populated >>> + -rw-r--r-- 1 root root 0 2014-10-13 09:25 cgroup.procs >>> + -rw-r--r-- 1 root root 0 2014-10-13 09:32 cgroup.subtree_control >> >> I've patched libvirt-lxc to issue CLONE_NEWCGROUP and not bind mount cgroupfs into a container. >> But I'm unable to mount cgroupfs within the container, mount(2) is failing with EINVAL. >> And /proc/self/cgroup still shows the cgroup from outside. >> >> ---cut--- >> container:/ # ls /sys/fs/cgroup/ >> container:/ # mount -t cgroup none /sys/fs/cgroup/ > > You need to provide "-o __DEVEL_sane_behavior" flag. Inside the > container, only unified hierarchy can be mounted. So, for now, that > flag is needed. I will fix the documentation too. > >> mount: wrong fs type, bad option, bad superblock on none, >> missing codepage or helper program, or other error >> >> In some cases useful info is found in syslog - try >> dmesg | tail or so. >> container:/ # cat /proc/self/cgroup >> 8:memory:/machine/test00.libvirt-lxc >> 7:devices:/machine/test00.libvirt-lxc >> 6:hugetlb:/ >> 5:cpuset:/machine/test00.libvirt-lxc >> 4:blkio:/machine/test00.libvirt-lxc >> 3:cpu,cpuacct:/machine/test00.libvirt-lxc >> 2:freezer:/machine/test00.libvirt-lxc >> 1:name=systemd:/user.slice/user-0.slice/session-c2.scope >> container:/ # ls -la /proc/self/ns >> total 0 >> dr-x--x--x 2 root root 0 Dec 14 23:02 . >> dr-xr-xr-x 8 root root 0 Dec 14 23:02 .. >> lrwxrwxrwx 1 root root 0 Dec 14 23:02 cgroup -> cgroup:[4026532240] >> lrwxrwxrwx 1 root root 0 Dec 14 23:02 ipc -> ipc:[4026532238] >> lrwxrwxrwx 1 root root 0 Dec 14 23:02 mnt -> mnt:[4026532235] >> lrwxrwxrwx 1 root root 0 Dec 14 23:02 net -> net:[4026532242] >> lrwxrwxrwx 1 root root 0 Dec 14 23:02 pid -> pid:[4026532239] >> lrwxrwxrwx 1 root root 0 Dec 14 23:02 user -> user:[4026532234] >> lrwxrwxrwx 1 root root 0 Dec 14 23:02 uts -> uts:[4026532236] >> container:/ # >> >> #host side >> lxc-os132:~ # ls -la /proc/self/ns >> total 0 >> dr-x--x--x 2 root root 0 Dec 14 23:56 . >> dr-xr-xr-x 8 root root 0 Dec 14 23:56 .. >> lrwxrwxrwx 1 root root 0 Dec 14 23:56 cgroup -> cgroup:[4026531835] >> lrwxrwxrwx 1 root root 0 Dec 14 23:56 ipc -> ipc:[4026531839] >> lrwxrwxrwx 1 root root 0 Dec 14 23:56 mnt -> mnt:[4026531840] >> lrwxrwxrwx 1 root root 0 Dec 14 23:56 net -> net:[4026531957] >> lrwxrwxrwx 1 root root 0 Dec 14 23:56 pid -> pid:[4026531836] >> lrwxrwxrwx 1 root root 0 Dec 14 23:56 user -> user:[4026531837] >> lrwxrwxrwx 1 root root 0 Dec 14 23:56 uts -> uts:[4026531838] >> ---cut--- >> >> Any ideas? >> > > Please try with "-o __DEVEL_sane_behavior" flag to the mount command. Ohh, this renders the whole patch useless for me as systemd needs the "old/default" behavior of cgroups. :-( I really hoped that cgroup namespaces will help me running systemd in a sane way within Linux containers. Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html