A bit of context: the original RCU conversion of struct mount handling had several bugs, spotted and fixed only 5 years later, in 119e1ef80ecf "fix __legitimize_mnt()/mntput() race". However, bits and pieces of old broken approach hadn't been removed. legitimize_mnt() takes a non-counting reference the caller had found after rcu_read_lock() and sampling of mount_lock seqcount and tries to turn it into counting one if mount_lock seqcount had not been changed. The hard part is dealing with the possibility of race with final mntput() - we want the successful fast path through legitimize_mnt() to be lockless, so there's a possibility of legitimize_mnt() incrementing refcount, only to recheck the mount_lock seqcount and notice that it had been disturbed by what would've been the final mntput(). In that case legitimize_mnt() needs to drop the reference it has acquired and report failure. Original approach had been to have the final mntput() to mark mount as doomed when it commits to killing it off, with legitimize_mnt() doing mntput() if it noticed mount_lock disturbed while it had been grabbing a reference and mntput checking if it has already marked the mount doomed and leaving its destruction to the thread that had done the marking. That had been racy - since this mntput() from legitimize_mnt() might end up doing real work (if what would've been the final mntput() had observed attempted increment by legitimize_mnt()) it can't be done under rcu_read_lock(), but in case if the race went the other way round and final mntput() had *not* seen an attempted increment, rcu_read_lock() we are holding might be the only thing preventing freeing of struct mount in question. Fortunately, legitimize_mnt() can tell one case from another by grabbing mount_lock and checking whether the mount had been marked doomed - if it had been marked we known that the final mntput() has already done mnt_get_count() and not observed our increment, so we can just decrement it quietly. If it hadn't been marked, we know that we need to do full mntput(), but we also know that it's safe to drop rcu_read_lock() - mount won't get freed under us. That's what the commit in question had done - in effect, it had taken the "is it already marked doomed?" check from mntput() to legitimize_mnt(). Which is the right thing to do, but we should've removed that check from mntput() itself. While we are at it, there's no reason for mntput() to hold onto rcu_read_lock() past the handling of "still mounted, we know it's not the final drop" case. Patch below takes the remnants out. It should've been done as part of the original fix; as it is, the magical mystery shite had been left behind. Folks, could you give it some beating? I realize that original reproducers might be long gone, but... Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> --- diff --git a/fs/namespace.c b/fs/namespace.c index 68789f896f08..ad94f9e228ae 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1224,6 +1224,7 @@ static void mntput_no_expire(struct mount *mnt) rcu_read_unlock(); return; } + rcu_read_unlock(); lock_mount_hash(); /* * make sure that if __legitimize_mnt() has not seen us grab @@ -1234,17 +1235,10 @@ static void mntput_no_expire(struct mount *mnt) count = mnt_get_count(mnt); if (count != 0) { WARN_ON(count < 0); - rcu_read_unlock(); - unlock_mount_hash(); - return; - } - if (unlikely(mnt->mnt.mnt_flags & MNT_DOOMED)) { - rcu_read_unlock(); unlock_mount_hash(); return; } mnt->mnt.mnt_flags |= MNT_DOOMED; - rcu_read_unlock(); list_del(&mnt->mnt_instance);