Re: allowing for a completely cached umount(2) pathwalk

Christian Brauner <brauner@xxxxxxxxxx> · Fri, 14 Apr 2023 14:18:09 +0200

On Fri, Apr 14, 2023 at 06:01:59AM -0400, Jeff Layton wrote:
> On Fri, 2023-04-14 at 03:32 +0100, Al Viro wrote:
> > On Thu, Apr 13, 2023 at 06:00:42PM -0400, Jeff Layton wrote:
> > 
> > > It describes a situation where there are nested NFS mounts on a client,
> > > and one of the intermediate mounts ends up being unexported from the
> > > server. In a situation like this, we end up being unable to pathwalk
> > > down to the child mount of these unreachable dentries and can't unmount
> > > anything, even as root.
> > 
> > So umount -l the stuck sucker.  What's the problem with that?
> > 
> 
> You mean lazy umount the parent that is stuck? What happens to the child
> mount in that case? Is it also eventually cleaned up?

I hope it's ok I barge in to answer this but due to the mount beneath
patches I was working on I did spend even more time in that code then I
already did. So this is good chance to get yelled at if I analyzed these
codepaths wrong.

The child mount would be unmounted in that case. umount_tree() is what
you want to be looking at.

If you perform a regular umount _without_ MNT_DETACH you can see that
umount_tree() is effectively guarded by a call to propagate_mount_busy().
It checks wether the direct umount target has any child mounts and if so
refuses the umount with EBUSY:

        mkdir -p /mnt/a/b /mnt/c /mnt/d

	# Create parent mount of a@c
        mount --bind /mnt/a /mnt/c

	# create child d@b which as child mount of a@c
        mount --bind /mnt/d /mnt/c/b

If you call umount /mnt/c it will fail because a@c has child mounts.
If you do a lazy umount via MNT_DETACH through umount -l /mnt/c then it
will also unmount all children of a@c. In fact it will even include
children of children...

	mkdir /mnt/c/b/e
	mount --bind /mnt/a/b/ /mnt/c/b/e
	umount -l /mnt/c

That's basically what the next_mnt() loop at the beginning of
umount_tree() is doing where it collects all direct targets to umount.

However, if mount propagation is in play things get a lot nastier as you
can fail a non-MNT_DETACH umount because of it as well (Note that umount
propagation is always triggered if the parent mount of your direct
umount target is a shared mount. IOW, you can't easily opt out of it
unless you make the parent mount of your immediate umount target a
non-shared mount.).

A trivial reason that comes to mind where you would fail the umount due
to mount propagation would where a propagated mount is kept busy and not
the original mount. So similar to above on the host do:

        mkdir -p /mnt/a/b /mnt/c /mnt/d
        mount --bind /mnt/a /mnt/c
        umount /mnt/c

and you would expect the umount /mnt/c to work. But you realize it fails
with EBUSY but noone is referencing that mount anymore at least not in
an obvious way.

But assume someone had a mount namespace open that receives mount
propagation from /. In that case the a@c mount would have propagated
into that mount namespace. So someone could've cd /mnt/c into that
propagated mount and the umount /mnt/c would fail.

In that case propagate_mount_busy() would detect the increased refcount
when it tries to check whether the umount could be propagated and give
you EBUSY. So here you also need a lazy umount to get rid of that
mount... And there are other nice scenarios where that's hard to figure
out.