Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > umount_tree() is very definitely not supposed to be called > on MNT_UMOUNT subtrees (== stuck-together fragments that got > unmounted, but not split into individual mount nodes). Refcounting > rules are different there and umount_tree() assumes that we start with > the normal ones. > > do_umount() appears to be checking for that: > > if (flags & MNT_DETACH) { > if (!list_empty(&mnt->mnt_list)) > umount_tree(mnt, UMOUNT_PROPAGATE); > retval = 0; > } else { > shrink_submounts(mnt); > retval = -EBUSY; > if (!propagate_mount_busy(mnt, 2)) { > if (!list_empty(&mnt->mnt_list)) > umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC); > retval = 0; > } > } > > which would prevent umount_tree() on those - mnt_list eviction happens > for the same nodes that get MNT_UMOUNT. However, shrink_submounts() > will call umount_tree() for e.g. nfs automounts it finds on victim > mount, and if ours happens to be already unmounted, with automounts > stuck to it, we have trouble. > > It looks like something that should be impossible to hit, but... > > A: umount(2) looks the sucker up > B: rmdir(2) in another namespace (where nothing is mounted on that mountpoint) > does __detach_mounts(), which grabs namespace_sem, sees the mount A is about > to try and kill and calls umount_tree(mnt, UMOUNT_CONNECTED). Which detaches > our mount (and its children, automounts included) from the namespace it's in, > modifies their refcounts accordingly and keeps the entire thing in one > piece. > A: in do_umount() blocks on namespace_sem > B: drops namespace_sem > A: gets to the quoted code. mnt is already MNT_UMOUNT (and has empty > ->mnt_list), but it does have (equally MNT_UMOUNT) automounts under it, > etc. So shrink_submounts() finds something to umount and calls umount_tree(). > Buggered refcounts happen. > > Does anybody see a problem with the following patch? I think it looks like 2 patches. One patch to remove some list_empty checks. > - if (!list_empty(&mnt->mnt_list)) > - umount_tree(mnt, UMOUNT_PROPAGATE); > + umount_tree(mnt, UMOUNT_PROPAGATE); Another patch to add a MNT_UMOUNT condition. The MNT_UMOUNT condition should probably be checked optimistically along with MNT_FORCE in can_umount as well to be explicit. I think it is redundant with check_mnt but it would add clarity to what we are looking for, and keep people thinking about the MNT_UMOUNT races. Testing MNT_UMOUNT after the locks have been grabbed seem very reasonable. Alternately the code could call check_mnt again and lean on that test. But testing MNT_UMOUNT seems sufficient. > diff --git a/fs/namespace.c b/fs/namespace.c > index 42d4fc21263b2..1604b9d7a69d9 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1654,21 +1654,20 @@ static int do_umount(struct mount *mnt, int flags) > lock_mount_hash(); > > /* Recheck MNT_LOCKED with the locks held */ > + /* ... and don't go there if we'd raced and it's already unmounted */ > retval = -EINVAL; > - if (mnt->mnt.mnt_flags & MNT_LOCKED) > + if (mnt->mnt.mnt_flags & (MNT_LOCKED|MNT_UMOUNT)) > goto out; > > event++; > if (flags & MNT_DETACH) { > - if (!list_empty(&mnt->mnt_list)) > - umount_tree(mnt, UMOUNT_PROPAGATE); > + umount_tree(mnt, UMOUNT_PROPAGATE); > retval = 0; > } else { > shrink_submounts(mnt); > retval = -EBUSY; > if (!propagate_mount_busy(mnt, 2)) { > - if (!list_empty(&mnt->mnt_list)) > - umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC); > + umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC); > retval = 0; > } > } Eric