Re: [RFC] umount/__detach_mounts() race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



	BTW, for the folks not familiar with the area -
refcounting rules for mounts are

	(1) being a child of some mount contributes 1.  That
applies to all mounts.
	(2) being a part of a tree of some namespace contributes
number of children; ditto for a tree that had never been in any
namespace.
	(3) being a part of decaying fragment (something detached
by umount_tree()) does not contribute anything.  (1) still applies,
but (2) does not.
	(4) being a root of some namespace contributes 1.
	(5) being a part of a list passing through ->mnt_umount
contributes 1.

Everything else is due to explicit references stored in some objects
other than mnt_namespace and mount instances, including local variables,
etc.

Decaying fragments are distinguishable by having MNT_UMOUNT on all
mounts in them (namespace_lock() stabilizes that, just as it does to
everything else tree-related).  There should never be a mix of
MNT_UMOUNT and non-MNT_UMOUNT mounts in any tree.

Everything in decaying fragment also has
	NULL ->mnt_ns
	empty ->mnt_list
	NULL ->mnt_master
	empty ->mnt_shared/->mnt_slave/->mnt_slave_list
	empty ->mnt_expire

Decaying fragments can fall apart into smaller subtrees.
If nothing else, that happens when there's no more external
references to the root of fragment - its children are torn away,
moved into a list that passes through their ->mnt_umount
(see mntput_no_expire(), circa line 1210) and later dropped
(cleanup_mnt(), circa line 1138).  It also happens when
mountpoint of something in a fragment gets invalidated
(see __detach_mounts(), MNT_UMOUNT case).

However, that's the only thing that can happen to those fragments -
they can't get anything mounted/unmounted/moved.  Stepping on
something like NFS referral point fails, instead of automounting
anything, etc.

	umount_tree() should never be called on those - it's about
tearing some part of live trees (whether they are in a namespace
or not) into decaying fragments and collecting the roots of those
fragments into a list ('unmounted', passing through ->mnt_umount)
for the subsequent namespace_unlock() to drop.

Another thing: namespace root may go into decaying fragment (which can
happen only if it's explicitly passed as argument to umount_tree())
only when the namespace is about to be freed.  I think we should be
OK there, but I'm not happy with the proof - it's too convoluted and
smells like it might be brittle.  Might be worth making that more
straightforward...

Another fun area is handling of internal mounts (which is what got
me started on that round of code review).  Both in terms of
RCU delays between ->mnt_ns going NULL and in terms of sync
behaviour needed in final mntput() for those.  Final mntput()
also involves rather delicate interactions with legitimizing
references acquired in RCU mode - that's what __legitimize_mnt()
and MNT_DOOMED is about and documentation is... lacking, to
put it mildly.  It's scattered over a bunch of commit messages,
and incomplete even there.

I'll post when I get the documentation of that stuff into more or
less usable shape.  IMO it should go into Documentation/filesystems/*,
but it'll obviously need a review first...



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux