Re: [PATCH] [RFC] mnt: add ability to clone mntns starting with the current root

Serge Hallyn <serge.hallyn@xxxxxxxxxx> · Tue, 7 Oct 2014 21:33:49 +0000



Quoting Andy Lutomirski (luto@xxxxxxxxxxxxxx):
> On Tue, Oct 7, 2014 at 1:30 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> > Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:
> >
> > 2> On Tue, Oct 07, 2014 at 02:30:40PM +0100, Al Viro wrote:
> >>> On Tue, Oct 07, 2014 at 04:12:57PM +0400, Andrey Vagin wrote:
> >>> > Another problem is that rootfs can't be hidden from a container, because
> >>> > rootfs can't be moved or umounted.
> >>>
> >>> ... which is a bug in mntns_install(), AFAICS.
> >>
> >> Ability to get to exposed rootfs, that is.
> >
> > The container side of this argument is pretty bogus.  It only applies
> > if user namespaces are not used for the container.
> >
> > So it is only root (and not root in a container) who can get to the
> > exposed rootfs.
> >
> > I have a vague memory someone actually had a real use in miminal systems
> > for being able to get back to the rootfs and being able to use rootfs as
> > the rootfs.  There was even a patch at that time that Andrew Morton was
> > carrying for a time to allow unmounting root and get at rootfs, and to
> > prevent the oops on rootfs unmount in some way.
> >
> > So not only do I not think it is a bug to get back too rootfs, I think
> > it is a feature that some people have expressed at least half-way sane
> > uses for.
> >
> >>> > Here is an example how to get access to rootfs:
> >>> > fd = open("/proc/self/ns/mnt", O_RDONLY)
> >>> > umount2("/", MNT_DETACH);
> >>> > setns(fd, CLONE_NEWNS)
> >>> >
> >>> > rootfs may contain data, which should not be avaliable in CT-s.
> >>>
> >>> Indeed.
> >>
> >> ... and it looks like the above is what your mangled reproducer in previous
> >> patch had been made of -
> >>       fd = open("/proc/self/ns/mnt", O_RDONLY)
> >>       umount2("/", MNT_DETACH);
> >>       setns(fd, CLONE_NEWNS)
> >>       umount2("/", MNT_DETACH);
> >>
> >> IMO what it shows is setns() bug.  This "switch root/cwd, no matter what"
> >> is wrong.
> >
> > IMO the bug is allowing us to unmount things that should never be unmounted.
> >
> > In a mount namespace created with just user namespace permissions we
> > can't get at rootfs because MNT_LOCKED is set on the root directory
> > and thus it can not be mounted.
> >
> > Further if anyone has permission to call chroot and chdir on any mount
> > in a mount namespace (that isn't currently covered) they can get at all
> > of them that are not currently covered.  A mount namespace where no one
> > can get at any uncovered filesystem seems to be the definition of
> > useless and ridiculous.
> >
> >
> > Now there is a bug in that MNT_DETACH today does not currently enforce
> > MNT_LOCKED on submounts of the mount point that is detached. I am
> > currently looking at how to construct the appropriate permission check
> > to prevent that.  Unfortunately I can not disallow MNT_DETACH with
> > submounts all together as that breaks too many legitimate uses.
> 
> Why should MNT_LOCKED on submounts be enforced?
> 
> Is it because, if you retain a reference to the detached tree, then
> you can see under the submounts?  If so, let's fix *that*.  Because
> otherwise the whole model of pivot_root + detach will break.
> 
> Also, damn it, we need change_the_ns_root instead of pivot_root.  I
> doubt that any container programs actually want to keep the old root
> attached after pivot_root.

Right I think that'll fix the problem we were having, and I think
Andrey said the same thing in another list a few days ago.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html