Just testing libvirt with user namespaces on current Fedora rawhide 3.13.0-0.rc0.git3.2.fc21.x86_64 kernel, I'm now getting an error when we attempt to mount /proc # virsh -c lxc:/// start shell error: Failed to start domain shell error: internal error: guest failed to start: Failed to mount proc on /proc type proc flags=e: Operation not permitted The syscall failing is mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = -1 EPERM (Operation not permitted) On the host OS the default Fedora environment has the following mounts present # grep /proc /proc/mounts proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=41,pgrp=1,timeout=300,minproto=5,maxproto=5,direct 0 0 binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0 sunrpc /proc/fs/nfsd nfsd rw,relatime 0 0 # ls /proc/fs/nfsd/ export_features filehandle nfsv4gracetime nfsv4recoverydir pool_threads reply_cache_stats threads unlock_ip exports max_block_size nfsv4leasetime pool_stats portlist supported_krb5_enctypes unlock_filesystem versions # ls /proc/sys/fs/binfmt_misc/ qemu-alpha qemu-cris qemu-microblazeel qemu-mips64el qemu-ppc64 qemu-sh4 qemu-sparc32plus status qemu-arm qemu-m68k qemu-mips qemu-mipsel qemu-ppc64abi32 qemu-sh4eb qemu-sparc64 qemu-armeb qemu-microblaze qemu-mips64 qemu-ppc qemu-s390x qemu-sparc register Only if I umount both of the /proc/sys/fs/binfmt_misc/ entries am I able to get past this EPERM error code. Looking at GIT history I see this change as a likely candidate for something which has changed in this area: commit e51db73532955dc5eaba4235e62b74b460709d5b Author: Eric W. Biederman <ebiederm@xxxxxxxxxxxx> Date: Sat Mar 30 19:57:41 2013 -0700 userns: Better restrictions on when proc and sysfs can be mounted Rely on the fact that another flavor of the filesystem is already mounted and do not rely on state in the user namespace. Verify that the mounted filesystem is not covered in any significant way. I would love to verify that the previously mounted filesystem has no mounts on top but there are at least the directories /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly for other filesystems to mount on top of. Refactor the test into a function named fs_fully_visible and call that function from the mount routines of proc and sysfs. This makes this test local to the filesystems involved and the results current of when the mounts take place, removing a weird threading of the user namespace, the mount namespace and the filesystems themselves. Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> My guess is fs_fully_visible() is returning false, and thus causing the proc_mount() call to return EPERM, but I'm unclear why this would happen, or if this is indeed a correct hypothesis. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers