allowing for a completely cached umount(2) pathwalk

Jeff Layton <jlayton@xxxxxxxxxx> · Thu, 13 Apr 2023 18:00:42 -0400

David Wysochanski posted this a week ago:

    https://lore.kernel.org/linux-nfs/CALF+zOnizN1KSE=V095LV6Mug8dJirqk7eN1joX8L1-EroohPA@xxxxxxxxxxxxxx/

It describes a situation where there are nested NFS mounts on a client,
and one of the intermediate mounts ends up being unexported from the
server. In a situation like this, we end up being unable to pathwalk
down to the child mount of these unreachable dentries and can't unmount
anything, even as root.

A decade ago, we did some work to make the kernel not revalidate the
leaf dentry on a umount [1]. This helped some similar sorts of problems
but it doesn't help if the problem is an intermediate dentry.

The idea at the time was that umount(2) is a special case: We are
specifically looking to stop using the mount, so there's nothing to be
gained by revalidating its root dentry and inode.

Based on the problem Dave describes, I'd submit that umount(2) is
special in another way too: It's intended to manipulate the mount table
of the local host, so contacting the backing store (the NFS server in
this case) during a pathwalk doesn't really help anything. All we care
about is getting to the right spot in the mount tree.

A "modest" proposal:
--------------------
This is still somewhat handwavy, but what if we were to make umount(2)
an even more special case for the pathwalk? For the umount(2) pathwalk,
we could:

1/ walk down the dentry tree without calling ->d_revalidate: We don't
care about changes that might have happened remotely. All we care about
is walking down the cached dentries as they are at that moment.

2/ disallow ->lookup operations: a umount is about removing an existing
mount, so the dentries had better already be there.

3/ skip inode ->permission checks. We don't want to check with the
server about our permission to walk the path when we're looking to
unmount. We're walking down the path on the _local_ machine so we can
unuse it. The server should have no say-so in the matter. (We probably
would want to require CAP_SYS_ADMIN or CAP_DAC_READ_SEARCH for this of
course).

We might need other safety checks too that I haven't considered yet.

Is this a terrible idea? Are there potentially problems with
containerized setups if we were to do something like this? Are there
better ways to solve this problem (and others like it)? Maybe this would
be best done with a new UMOUNT_CACHED flag for umount2()?

-- 
Jeff Layton <jlayton@xxxxxxxxxx>

[1] 8033426e6bdb vfs: allow umount to handle mountpoints without
revalidating them