Re: [RFC] umount/__detach_mounts() race

"Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> · Mon, 21 Feb 2022 10:46:26 -0600

Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:

> 	BTW, while looking through the vicinity - I think this 
>         if (!check_mnt(old) && old_path->dentry->d_op != &ns_dentry_operations)
> 			return mnt;
> in __do_loopback() is too permissive.  I'd prefer to replace it with
>         if (!check_mnt(old)) {
> 		// allow binding objects from internal instance of nsfs
> 		if (old->mnt_ns != MNT_NS_INTERAL ||
> 		    old_path->dentry->d_op != &ns_dentry_operations)
> 			return mnt;
> 	}
>
> Suppose we'd bound an nsfs object e.g. on /tmp/foo.  Consider a race
> between mount --bind /tmp/foo /tmp/bar and umount -l /tmp/foo.
>
> do_loopback() resolves /tmp/foo.  We have dentry from nsfs and mount
> that sits on /tmp/foo.  We'd resolved /tmp/bar.
>
> In the meanwhile, umount has resolved /tmp/foo and unmounted it.
> struct mount is still alive due to the reference held by do_loopback().
>
> do_loopback() finally grabs namespace_sem.  It verifies that mountpoint
> to be (/tmp/bar) is still OK (it is) and calls __do_loopback().  The
> check in __do_loopback() passes - old is unmounted, but dentry is
> nsfs one, so we proceed to clone old.
>
> And that's where the things go wrong - we copy the flags, including
> MNT_UMOUNT, to the new instance.  And proceed to attach it on /tmp/bar.
>
> It's really not a good thing.  E.g. __detach_mnt() will assume that
> reference to the parent mount of /tmp/bar is *not* held.  As the
> matter of fact, it is, so we'll get a leak if the mountpoint (/tmp/bar,
> that is) gets unlinked in another namespace.  That's not the only way
> to get trouble - we are really not supposed to have MNT_UMOUNT mounts
> attached to !MNT_UMOUNT ones.
>
> Eric, do you see any problems with the change above?

Such as breaking userspace code? Maybe.

Currently we exempt nsfs dentries from the same namespace restriction
when cloning them.

If I read your proposal correctly you are proposing only exempting nsfs
dentries that are internally mounted from the same namespace
restriction.

We need to keep the ordinary case of bind mounts from one nsfs dentry to
another dentry working even after it is mounted.

If my foggy brain is reasoning correctly you are proposing not allowing
bind mounts before the mount has been attached or after the mount has
been detached by umount_tree.  Not allowing already umounted mounts to
be bind mounted seems semantically fine.  Userspace should not care.

I wonder if perhaps we should have an explicit test for MNT_UMOUNT in
__do_loopback.

Eric