Re: [PATCH 03/34] teach move_mount(2) to work with OPEN_TREE_CLONE [ver #12]

David Howells <dhowells@xxxxxxxxxx> · Thu, 11 Oct 2018 10:17:31 +0100

Alan Jenkins <alan.christopher.jenkins@xxxxxxxxx> wrote:

> # unshare --mount=private_mnt/child_ns --propagation=shared ls -l /proc/self/ns/mnt

I think the problem is that the mount of the nsfs object done by unshare here
pins the new mount namespace - but doesn't add the namespace's contents into
the mount tree, so the mount struct cycle-detection code is bypassed.

I think it's fine for all other namespaces, just not the mount namespace.

It looks like this bug might theoretically exist upstream also, though I don't
think there's any way to actually effect it given that mount() doesn't take a
dirfd argument.

The reason that you can do this with open_tree()/move_mount() is that it
allows you to create a mount tree (OPEN_TREE_CLONE) that has no namespace
assignment, pass it through the namespace switch and then attach it inside the
child namespace.  The cross-namespace checks in do_move_mount() are bypassed
because the root of the newly-cloned mount tree doesn't have one.

Unfortunately, just searching the newly-cloned mount tree for a conflicting
nsfs mount doesn't help because the potential loop could be hidden several
levels deep.

I think the simplest solution is to either reject a request for
open_tree(OPEN_TREE_CLONE) if there are any nsfs objects in the source tree,
or to just not copy said objects.

David
---

Test script:

	mount -t tmpfs none /a
	mount --make-shared /a
	cd /a
	mkdir private_mnt
	mount -t tmpfs xxx private_mnt
	mount --make-private private_mnt
	touch private_mnt/child_ns
	unshare --mount=private_mnt/child_ns --propagation=shared \
	    ls -l /proc/self/ns/mnt
	findmnt

	~/open_tree 3</a/private_mnt 3 \
	    nsenter --mount=/a/private_mnt/child_ns \
	    sh -c '~/move_mount 4</mnt'

	grep Shmem: /proc/meminfo
	dd if=/dev/zero of=/a/private_mnt/bigfile bs=1M count=10

	umount -l /a/private_mnt/
	grep Shmem: /proc/meminfo