Am 11.03.2015 um 03:30 schrieb Chen, Hanxiao: >>> @@ -826,8 +829,25 @@ static int lxcContainerMountBasicFS(bool userns_enabled) >>> bool bindOverReadonly; >>> virLXCBasicMountInfo const *mnt = &lxcBasicMounts[i]; >>> >>> + /* When enable userns but disable netns, kernel will >>> + * forbid us doing a new fresh mount for sysfs. >>> + * So we had to do a bind mount for sysfs instead. >>> + */ >>> + if (userns_enabled && netns_disabled && >>> + STREQ(mnt->src, "sysfs")) { >>> + if (VIR_STRDUP(mnt_src, "/sys") < 0) { >>> + goto cleanup; >>> + } >> >> This is clearly broken and looks very untested to me. >> > It's broken now. > But when I submitted this patch last year, it's not. Are you sure? Just built libvirt v1.2.6-222-ga86b621, head is commit a86b6215a74b1feb2667204e214fbfd2f7decc5c Author: Chen Hanxiao <chenhanxiao@xxxxxxxxxxxxxx> Date: Mon Jul 14 18:01:51 2014 +0800 LXC: create a bind mount for sysfs when enable userns but disable netns /sys is still an empty directory but as at this time (most likely due to another bug) libvirt was able to create /sys/fs/cgroup and mounted groups there. But no sysfs at all is at /sys. I mean, how is this supposed to work? You bind mount /sys over /sys... >> It will issue this mount call: >> mount("/sys", "/sys", "sysfs", MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_BIND, NULL) >> because the code runs after pivot_root(2). >> i.e, /sys will be still empty after that and no sysfs at all there. >> As libvirt will later remount /sys readonly creating a container will >> fail with the most useless error message: >> Error: internal error: guest failed to start: Unable to create >> directory /sys/fs/: Read-only file system >> or >> Error: internal error: guest failed to start: Unable to create >> directory /sys/fs/cgroup: Read-only file system >> >> Please note that changing "/sys" to "/.oldroot/sys" will not solve the >> issue as this code runs already in the new >> namespace and therefore the old mount tree is locked, thus MS_BIND is >> not allowed. >> >> This brings me to the question, why do you handle the netns_disabled >> case anyway? > > Please check the discussion at: > http://lists.linux-foundation.org/pipermail/containers/2014-July/034721.html > >> If in the XML file no network is specified just create a new and empty >> network namespace. >> Bindmounting /sys into the container is a security issue. This is why >> mounting sysfs without a netns >> was disabled to begin with. > > Yes, I tried to propose enable netns by default, > but Dan thought that we should allow containers sharing the host's network: > http://www.redhat.com/archives/libvir-list/2013-August/msg01025.html > > So we should allow user create containers without netns, > they should know what they do if they read libvirt's docs > See docs patch describe security considerations: > http://www.redhat.com/archives/libvir-list/2013-September/msg00562.html Thanks for the links! //richard -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list