Re: allowing for a completely cached umount(2) pathwalk

"NeilBrown" <neilb@xxxxxxx> · Fri, 14 Apr 2023 08:41:03 +1000

On Fri, 14 Apr 2023, Jeff Layton wrote:
> David Wysochanski posted this a week ago:
> 
>     https://lore.kernel.org/linux-nfs/CALF+zOnizN1KSE=V095LV6Mug8dJirqk7eN1joX8L1-EroohPA@xxxxxxxxxxxxxx/
> 
> It describes a situation where there are nested NFS mounts on a client,
> and one of the intermediate mounts ends up being unexported from the
> server. In a situation like this, we end up being unable to pathwalk
> down to the child mount of these unreachable dentries and can't unmount
> anything, even as root.
> 
> A decade ago, we did some work to make the kernel not revalidate the
> leaf dentry on a umount [1]. This helped some similar sorts of problems
> but it doesn't help if the problem is an intermediate dentry.
> 
> The idea at the time was that umount(2) is a special case: We are
> specifically looking to stop using the mount, so there's nothing to be
> gained by revalidating its root dentry and inode.
> 
> Based on the problem Dave describes, I'd submit that umount(2) is
> special in another way too: It's intended to manipulate the mount table
> of the local host, so contacting the backing store (the NFS server in
> this case) during a pathwalk doesn't really help anything. All we care
> about is getting to the right spot in the mount tree.
> 
> A "modest" proposal:

I hope you didn't mean to reference
https://en.wikipedia.org/wiki/A_Modest_Proposal ...

> --------------------
> This is still somewhat handwavy, but what if we were to make umount(2)
> an even more special case for the pathwalk? For the umount(2) pathwalk,
> we could:
> 
> 1/ walk down the dentry tree without calling ->d_revalidate: We don't
> care about changes that might have happened remotely. All we care about
> is walking down the cached dentries as they are at that moment.
> 
> 2/ disallow ->lookup operations: a umount is about removing an existing
> mount, so the dentries had better already be there.
> 
> 3/ skip inode ->permission checks. We don't want to check with the
> server about our permission to walk the path when we're looking to
> unmount. We're walking down the path on the _local_ machine so we can
> unuse it. The server should have no say-so in the matter. (We probably
> would want to require CAP_SYS_ADMIN or CAP_DAC_READ_SEARCH for this of
> course).
> 
> We might need other safety checks too that I haven't considered yet.
> 
> Is this a terrible idea? Are there potentially problems with
> containerized setups if we were to do something like this? Are there
> better ways to solve this problem (and others like it)? Maybe this would
> be best done with a new UMOUNT_CACHED flag for umount2()?

It might be a terrible idea, but it is essentially the same idea that I
had, but hadn't got around to posting.

The path name that appears in /proc/mounts is the key that must be used
to find and unmount a filesystem.  When you do that "find"ing you are
not looking up a name in a filesystem, you are looking up a key in the
mount table.

We could, instead, create an api that is given a mount-id (first number
in /proc/self/mountinfo) and unmounts that.  Then /sbin/umount could
read /proc/self/mountinfo, find the mount-id, and unmount it - all
without ever doing path name lookup in the traditional sense.

But I prefer your suggestion.  LOOKUP_MOUNTPOINT could be renamed
LOOKUP_CACHED, and it only finds paths that are in the dcache, never
revalidates, at most performs simple permission checks based on cached
content.

NeilBrown