Quoting Daniel P. Berrange (berrange@xxxxxxxxxx): > Just testing libvirt with user namespaces on current Fedora rawhide > 3.13.0-0.rc0.git3.2.fc21.x86_64 kernel, I'm now getting an error when > we attempt to mount /proc Thanks, I saw the same thing with 3.12 on friday afternoon, and decided I must be too haggard from a week of unrelated work to think straight. This definately will be a problem, making user namespace unusable for containers. > # virsh -c lxc:/// start shell > error: Failed to start domain shell > error: internal error: guest failed to start: Failed to mount proc on /proc type proc flags=e: Operation not permitted > > The syscall failing is > > mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = -1 EPERM (Operation not permitted) > > > On the host OS the default Fedora environment has the following mounts > present > > # grep /proc /proc/mounts > proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 > systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=41,pgrp=1,timeout=300,minproto=5,maxproto=5,direct 0 0 > binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0 > sunrpc /proc/fs/nfsd nfsd rw,relatime 0 0 > > # ls /proc/fs/nfsd/ > export_features filehandle nfsv4gracetime nfsv4recoverydir pool_threads reply_cache_stats threads unlock_ip > exports max_block_size nfsv4leasetime pool_stats portlist supported_krb5_enctypes unlock_filesystem versions > > # ls /proc/sys/fs/binfmt_misc/ > qemu-alpha qemu-cris qemu-microblazeel qemu-mips64el qemu-ppc64 qemu-sh4 qemu-sparc32plus status > qemu-arm qemu-m68k qemu-mips qemu-mipsel qemu-ppc64abi32 qemu-sh4eb qemu-sparc64 > qemu-armeb qemu-microblaze qemu-mips64 qemu-ppc qemu-s390x qemu-sparc register > > > Only if I umount both of the /proc/sys/fs/binfmt_misc/ entries > am I able to get past this EPERM error code. > > Looking at GIT history I see this change as a likely candidate for > something which has changed in this area: > > commit e51db73532955dc5eaba4235e62b74b460709d5b > Author: Eric W. Biederman <ebiederm@xxxxxxxxxxxx> > Date: Sat Mar 30 19:57:41 2013 -0700 > > userns: Better restrictions on when proc and sysfs can be mounted > > Rely on the fact that another flavor of the filesystem is already > mounted and do not rely on state in the user namespace. > > Verify that the mounted filesystem is not covered in any significant > way. I would love to verify that the previously mounted filesystem > has no mounts on top but there are at least the directories > /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly > for other filesystems to mount on top of. > > Refactor the test into a function named fs_fully_visible and call that > function from the mount routines of proc and sysfs. This makes this > test local to the filesystems involved and the results current of when > the mounts take place, removing a weird threading of the user > namespace, the mount namespace and the filesystems themselves. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> > > > My guess is fs_fully_visible() is returning false, and thus causing the > proc_mount() call to return EPERM, but I'm unclear why this would happen, > or if this is indeed a correct hypothesis. > > > Regards, > Daniel > -- > |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| > _______________________________________________ > Containers mailing list > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx > https://lists.linuxfoundation.org/mailman/listinfo/containers _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers