On Fri, 14 Apr 2023, Jeff Layton wrote: > David Wysochanski posted this a week ago: > > https://lore.kernel.org/linux-nfs/CALF+zOnizN1KSE=V095LV6Mug8dJirqk7eN1joX8L1-EroohPA@xxxxxxxxxxxxxx/ > > It describes a situation where there are nested NFS mounts on a client, > and one of the intermediate mounts ends up being unexported from the > server. In a situation like this, we end up being unable to pathwalk > down to the child mount of these unreachable dentries and can't unmount > anything, even as root. > > A decade ago, we did some work to make the kernel not revalidate the > leaf dentry on a umount [1]. This helped some similar sorts of problems > but it doesn't help if the problem is an intermediate dentry. > > The idea at the time was that umount(2) is a special case: We are > specifically looking to stop using the mount, so there's nothing to be > gained by revalidating its root dentry and inode. > > Based on the problem Dave describes, I'd submit that umount(2) is > special in another way too: It's intended to manipulate the mount table > of the local host, so contacting the backing store (the NFS server in > this case) during a pathwalk doesn't really help anything. All we care > about is getting to the right spot in the mount tree. > > A "modest" proposal: I hope you didn't mean to reference https://en.wikipedia.org/wiki/A_Modest_Proposal ... > -------------------- > This is still somewhat handwavy, but what if we were to make umount(2) > an even more special case for the pathwalk? For the umount(2) pathwalk, > we could: > > 1/ walk down the dentry tree without calling ->d_revalidate: We don't > care about changes that might have happened remotely. All we care about > is walking down the cached dentries as they are at that moment. > > 2/ disallow ->lookup operations: a umount is about removing an existing > mount, so the dentries had better already be there. > > 3/ skip inode ->permission checks. We don't want to check with the > server about our permission to walk the path when we're looking to > unmount. We're walking down the path on the _local_ machine so we can > unuse it. The server should have no say-so in the matter. (We probably > would want to require CAP_SYS_ADMIN or CAP_DAC_READ_SEARCH for this of > course). > > We might need other safety checks too that I haven't considered yet. > > Is this a terrible idea? Are there potentially problems with > containerized setups if we were to do something like this? Are there > better ways to solve this problem (and others like it)? Maybe this would > be best done with a new UMOUNT_CACHED flag for umount2()? It might be a terrible idea, but it is essentially the same idea that I had, but hadn't got around to posting. The path name that appears in /proc/mounts is the key that must be used to find and unmount a filesystem. When you do that "find"ing you are not looking up a name in a filesystem, you are looking up a key in the mount table. We could, instead, create an api that is given a mount-id (first number in /proc/self/mountinfo) and unmounts that. Then /sbin/umount could read /proc/self/mountinfo, find the mount-id, and unmount it - all without ever doing path name lookup in the traditional sense. But I prefer your suggestion. LOOKUP_MOUNTPOINT could be renamed LOOKUP_CACHED, and it only finds paths that are in the dcache, never revalidates, at most performs simple permission checks based on cached content. NeilBrown