Quoting Andy Lutomirski (luto@xxxxxxxxxxxxxx): > On Tue, Oct 7, 2014 at 1:30 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > > Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > > > > 2> On Tue, Oct 07, 2014 at 02:30:40PM +0100, Al Viro wrote: > >>> On Tue, Oct 07, 2014 at 04:12:57PM +0400, Andrey Vagin wrote: > >>> > Another problem is that rootfs can't be hidden from a container, because > >>> > rootfs can't be moved or umounted. > >>> > >>> ... which is a bug in mntns_install(), AFAICS. > >> > >> Ability to get to exposed rootfs, that is. > > > > The container side of this argument is pretty bogus. It only applies > > if user namespaces are not used for the container. > > > > So it is only root (and not root in a container) who can get to the > > exposed rootfs. > > > > I have a vague memory someone actually had a real use in miminal systems > > for being able to get back to the rootfs and being able to use rootfs as > > the rootfs. There was even a patch at that time that Andrew Morton was > > carrying for a time to allow unmounting root and get at rootfs, and to > > prevent the oops on rootfs unmount in some way. > > > > So not only do I not think it is a bug to get back too rootfs, I think > > it is a feature that some people have expressed at least half-way sane > > uses for. > > > >>> > Here is an example how to get access to rootfs: > >>> > fd = open("/proc/self/ns/mnt", O_RDONLY) > >>> > umount2("/", MNT_DETACH); > >>> > setns(fd, CLONE_NEWNS) > >>> > > >>> > rootfs may contain data, which should not be avaliable in CT-s. > >>> > >>> Indeed. > >> > >> ... and it looks like the above is what your mangled reproducer in previous > >> patch had been made of - > >> fd = open("/proc/self/ns/mnt", O_RDONLY) > >> umount2("/", MNT_DETACH); > >> setns(fd, CLONE_NEWNS) > >> umount2("/", MNT_DETACH); > >> > >> IMO what it shows is setns() bug. This "switch root/cwd, no matter what" > >> is wrong. > > > > IMO the bug is allowing us to unmount things that should never be unmounted. > > > > In a mount namespace created with just user namespace permissions we > > can't get at rootfs because MNT_LOCKED is set on the root directory > > and thus it can not be mounted. > > > > Further if anyone has permission to call chroot and chdir on any mount > > in a mount namespace (that isn't currently covered) they can get at all > > of them that are not currently covered. A mount namespace where no one > > can get at any uncovered filesystem seems to be the definition of > > useless and ridiculous. > > > > > > Now there is a bug in that MNT_DETACH today does not currently enforce > > MNT_LOCKED on submounts of the mount point that is detached. I am > > currently looking at how to construct the appropriate permission check > > to prevent that. Unfortunately I can not disallow MNT_DETACH with > > submounts all together as that breaks too many legitimate uses. > > Why should MNT_LOCKED on submounts be enforced? > > Is it because, if you retain a reference to the detached tree, then > you can see under the submounts? If so, let's fix *that*. Because > otherwise the whole model of pivot_root + detach will break. > > Also, damn it, we need change_the_ns_root instead of pivot_root. I > doubt that any container programs actually want to keep the old root > attached after pivot_root. Right I think that'll fix the problem we were having, and I think Andrey said the same thing in another list a few days ago. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html