David Wysochanski posted this a week ago: https://lore.kernel.org/linux-nfs/CALF+zOnizN1KSE=V095LV6Mug8dJirqk7eN1joX8L1-EroohPA@xxxxxxxxxxxxxx/ It describes a situation where there are nested NFS mounts on a client, and one of the intermediate mounts ends up being unexported from the server. In a situation like this, we end up being unable to pathwalk down to the child mount of these unreachable dentries and can't unmount anything, even as root. A decade ago, we did some work to make the kernel not revalidate the leaf dentry on a umount [1]. This helped some similar sorts of problems but it doesn't help if the problem is an intermediate dentry. The idea at the time was that umount(2) is a special case: We are specifically looking to stop using the mount, so there's nothing to be gained by revalidating its root dentry and inode. Based on the problem Dave describes, I'd submit that umount(2) is special in another way too: It's intended to manipulate the mount table of the local host, so contacting the backing store (the NFS server in this case) during a pathwalk doesn't really help anything. All we care about is getting to the right spot in the mount tree. A "modest" proposal: -------------------- This is still somewhat handwavy, but what if we were to make umount(2) an even more special case for the pathwalk? For the umount(2) pathwalk, we could: 1/ walk down the dentry tree without calling ->d_revalidate: We don't care about changes that might have happened remotely. All we care about is walking down the cached dentries as they are at that moment. 2/ disallow ->lookup operations: a umount is about removing an existing mount, so the dentries had better already be there. 3/ skip inode ->permission checks. We don't want to check with the server about our permission to walk the path when we're looking to unmount. We're walking down the path on the _local_ machine so we can unuse it. The server should have no say-so in the matter. (We probably would want to require CAP_SYS_ADMIN or CAP_DAC_READ_SEARCH for this of course). We might need other safety checks too that I haven't considered yet. Is this a terrible idea? Are there potentially problems with containerized setups if we were to do something like this? Are there better ways to solve this problem (and others like it)? Maybe this would be best done with a new UMOUNT_CACHED flag for umount2()? -- Jeff Layton <jlayton@xxxxxxxxxx> [1] 8033426e6bdb vfs: allow umount to handle mountpoints without revalidating them