On Fri, Apr 14, 2023 at 06:01:59AM -0400, Jeff Layton wrote: > On Fri, 2023-04-14 at 03:32 +0100, Al Viro wrote: > > On Thu, Apr 13, 2023 at 06:00:42PM -0400, Jeff Layton wrote: > > > > > It describes a situation where there are nested NFS mounts on a client, > > > and one of the intermediate mounts ends up being unexported from the > > > server. In a situation like this, we end up being unable to pathwalk > > > down to the child mount of these unreachable dentries and can't unmount > > > anything, even as root. > > > > So umount -l the stuck sucker. What's the problem with that? > > > > You mean lazy umount the parent that is stuck? What happens to the child > mount in that case? Is it also eventually cleaned up? I hope it's ok I barge in to answer this but due to the mount beneath patches I was working on I did spend even more time in that code then I already did. So this is good chance to get yelled at if I analyzed these codepaths wrong. The child mount would be unmounted in that case. umount_tree() is what you want to be looking at. If you perform a regular umount _without_ MNT_DETACH you can see that umount_tree() is effectively guarded by a call to propagate_mount_busy(). It checks wether the direct umount target has any child mounts and if so refuses the umount with EBUSY: mkdir -p /mnt/a/b /mnt/c /mnt/d # Create parent mount of a@c mount --bind /mnt/a /mnt/c # create child d@b which as child mount of a@c mount --bind /mnt/d /mnt/c/b If you call umount /mnt/c it will fail because a@c has child mounts. If you do a lazy umount via MNT_DETACH through umount -l /mnt/c then it will also unmount all children of a@c. In fact it will even include children of children... mkdir /mnt/c/b/e mount --bind /mnt/a/b/ /mnt/c/b/e umount -l /mnt/c That's basically what the next_mnt() loop at the beginning of umount_tree() is doing where it collects all direct targets to umount. However, if mount propagation is in play things get a lot nastier as you can fail a non-MNT_DETACH umount because of it as well (Note that umount propagation is always triggered if the parent mount of your direct umount target is a shared mount. IOW, you can't easily opt out of it unless you make the parent mount of your immediate umount target a non-shared mount.). A trivial reason that comes to mind where you would fail the umount due to mount propagation would where a propagated mount is kept busy and not the original mount. So similar to above on the host do: mkdir -p /mnt/a/b /mnt/c /mnt/d mount --bind /mnt/a /mnt/c umount /mnt/c and you would expect the umount /mnt/c to work. But you realize it fails with EBUSY but noone is referencing that mount anymore at least not in an obvious way. But assume someone had a mount namespace open that receives mount propagation from /. In that case the a@c mount would have propagated into that mount namespace. So someone could've cd /mnt/c into that propagated mount and the umount /mnt/c would fail. In that case propagate_mount_busy() would detect the increased refcount when it tries to check whether the umount could be propagated and give you EBUSY. So here you also need a lazy umount to get rid of that mount... And there are other nice scenarios where that's hard to figure out.