BTW, for the folks not familiar with the area - refcounting rules for mounts are (1) being a child of some mount contributes 1. That applies to all mounts. (2) being a part of a tree of some namespace contributes number of children; ditto for a tree that had never been in any namespace. (3) being a part of decaying fragment (something detached by umount_tree()) does not contribute anything. (1) still applies, but (2) does not. (4) being a root of some namespace contributes 1. (5) being a part of a list passing through ->mnt_umount contributes 1. Everything else is due to explicit references stored in some objects other than mnt_namespace and mount instances, including local variables, etc. Decaying fragments are distinguishable by having MNT_UMOUNT on all mounts in them (namespace_lock() stabilizes that, just as it does to everything else tree-related). There should never be a mix of MNT_UMOUNT and non-MNT_UMOUNT mounts in any tree. Everything in decaying fragment also has NULL ->mnt_ns empty ->mnt_list NULL ->mnt_master empty ->mnt_shared/->mnt_slave/->mnt_slave_list empty ->mnt_expire Decaying fragments can fall apart into smaller subtrees. If nothing else, that happens when there's no more external references to the root of fragment - its children are torn away, moved into a list that passes through their ->mnt_umount (see mntput_no_expire(), circa line 1210) and later dropped (cleanup_mnt(), circa line 1138). It also happens when mountpoint of something in a fragment gets invalidated (see __detach_mounts(), MNT_UMOUNT case). However, that's the only thing that can happen to those fragments - they can't get anything mounted/unmounted/moved. Stepping on something like NFS referral point fails, instead of automounting anything, etc. umount_tree() should never be called on those - it's about tearing some part of live trees (whether they are in a namespace or not) into decaying fragments and collecting the roots of those fragments into a list ('unmounted', passing through ->mnt_umount) for the subsequent namespace_unlock() to drop. Another thing: namespace root may go into decaying fragment (which can happen only if it's explicitly passed as argument to umount_tree()) only when the namespace is about to be freed. I think we should be OK there, but I'm not happy with the proof - it's too convoluted and smells like it might be brittle. Might be worth making that more straightforward... Another fun area is handling of internal mounts (which is what got me started on that round of code review). Both in terms of RCU delays between ->mnt_ns going NULL and in terms of sync behaviour needed in final mntput() for those. Final mntput() also involves rather delicate interactions with legitimizing references acquired in RCU mode - that's what __legitimize_mnt() and MNT_DOOMED is about and documentation is... lacking, to put it mildly. It's scattered over a bunch of commit messages, and incomplete even there. I'll post when I get the documentation of that stuff into more or less usable shape. IMO it should go into Documentation/filesystems/*, but it'll obviously need a review first...