On Thu, May 28, 2015 at 11:46:50PM +0200, Richard Weinberger wrote: > Am 28.05.2015 um 23:32 schrieb Eric W. Biederman: > > Richard Weinberger <richard@xxxxxx> writes: > > > >> Am 28.05.2015 um 21:57 schrieb Eric W. Biederman: > >>>> FWIW, it breaks also libvirt-lxc: > >>>> Error: internal error: guest failed to start: Failed to re-mount /proc/sys on /proc/sys flags=1021: Operation not permitted > >>> > >>> Interesting. I had not anticipated a failure there? And it is failing > >>> in remount? Oh that is interesting. > >>> > >>> That implies that there is some flag of the original mount of /proc that > >>> the remount of /proc/sys is clearing, and that previously > >>> > >>> The flags specified are current rdonly,remount,bind so I expect there > >>> are some other flags on proc that libvirt-lxc is clearing by accident > >>> and we did not fail before because the kernel was not enforcing things. > >> > >> Please see: > >> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l933 > >> lxcContainerMountBasicFS() > >> > >> and: > >> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9ae5c2aaf0f90ff472f24fda43c077b44998c7;hb=HEAD#l850 > >> lxcBasicMounts > >> > >>> What are the mount flags in a working libvirt-lxc? > >> > >> See: > >> test1:~ # cat /proc/self/mountinfo > >> 149 147 0:56 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw > >> 150 149 0:56 /sys /proc/sys ro,nodev,relatime - proc proc rw > > > >> If you need more info, please let me know. :-) > > > > Oh interesting I had not realized libvirt-lxc had grown an unprivileged > > mode using user namespaces. > > Yep. It works quite well. I've migrated all my containers from lxc > to libvirt-lxc because libvirt-lxc had a working user-namespace > implementation before lxc. > > > This does appear to be a classic remount bug, where you are not > > preserving the permissions. It appears the fact that the code > > failed to enforce locked permissions on the fresh mount of proc > > was hiding this bug until now. > > > > I expect what you actually want is the code below: > > > > diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c > > index 9a9ae5c2aaf0..f008a7484bfe 100644 > > --- a/src/lxc/lxc_container.c > > +++ b/src/lxc/lxc_container.c > > @@ -850,7 +850,7 @@ typedef struct { > > > > static const virLXCBasicMountInfo lxcBasicMounts[] = { > > { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, > > - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, > > + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, > > { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, > > { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, > > { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, > > > > Or possibly just: > > > > diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c > > index 9a9ae5c2aaf0..a60ccbd12bfc 100644 > > --- a/src/lxc/lxc_container.c > > +++ b/src/lxc/lxc_container.c > > @@ -850,7 +850,7 @@ typedef struct { > > > > static const virLXCBasicMountInfo lxcBasicMounts[] = { > > { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, > > - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, > > + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false }, > > { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, > > { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, > > { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, > > I'll test your diff tomorrow with a fresh brain. > I sent a similar patch to libvirt folks some time ago, looks like it got lost. ;-\ > > > As the there is little point in making /proc/sys read-only in a > > user-namespace, as the permission checks are uid based and no-one should > > have the global uid 0 in your container. Making mounting /proc/sys > > read-only rather pointless. > > Yeah, I've been ranting about that for ages... > libvirt-lxc contains a lot of cruft to make privileged container > kind of secure. Some users still fear using the user-namespace. Yes, we've discussed this before and I'd like to simplify this. The thing that has been stopping me tackling it has been figuring out a way to ensure we don't change semantics for existing deployed users. ie when RHEL-7 rebases to newer libvirt, I don't want existing containers to suddenly change their setup, because although the existing setup is sub-optimal, some apps / users might be relying on its behaviour in ways I can't predict. I do believe I have figured out a way to allow backwards compatibility now though, so we should have able to have another stab at simplifying and removing this cruft for newly deployed containers. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html