Re: [REVIEW][PATCH 1/2] mnt: In propgate_umount handle visiting mounts in any order

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ram Pai <linuxram@xxxxxxxxxx> writes:

> On Wed, May 17, 2017 at 12:54:34AM -0500, Eric W. Biederman wrote:
>> 
>> While investigating some poor umount performance I realized that in
>> the case of overlapping mount trees where some of the mounts are locked
>> the code has been failing to unmount all of the mounts it should
>> have been unmounting.
>> 
>> This failure to unmount all of the necessary
>> mounts can be reproduced with:
>> 
>> $ cat locked_mounts_test.sh
>> 
>> mount -t tmpfs test-base /mnt
>> mount --make-shared /mnt
>> mkdir -p /mnt/b
>> 
>> mount -t tmpfs test1 /mnt/b
>> mount --make-shared /mnt/b
>> mkdir -p /mnt/b/10
>> 
>> mount -t tmpfs test2 /mnt/b/10
>> mount --make-shared /mnt/b/10
>> mkdir -p /mnt/b/10/20
>> 
>> mount --rbind /mnt/b /mnt/b/10/20
>> 
>> unshare -Urm --propagation unchaged /bin/sh -c 'sleep 5; if [ $(grep test /proc/self/mountinfo | wc -l) -eq 1 ] ; then echo SUCCESS ; else echo FAILURE ; fi'
>> sleep 1
>> umount -l /mnt/b
>> wait %%
>> 
>> $ unshare -Urm ./locked_mounts_test.sh
>> 
>> This failure is corrected by removing the prepass that marks mounts
>> that may be umounted.
>> 
>> A first pass is added that umounts mounts if possible and if not sets
>> mount mark if they could be unmounted if they weren't locked and adds
>> them to a list to umount possibilities.  This first pass reconsiders
>> the mounts parent if it is on the list of umount possibilities, ensuring
>> that information of umoutability will pass from child to mount parent.
>> 
>> A second pass then walks through all mounts that are umounted and processes
>> their children unmounting them or marking them for reparenting.
>> 
>> A last pass cleans up the state on the mounts that could not be umounted
>> and if applicable reparents them to their first parent that remained
>> mounted.
>> 
>> While a bit longer than the old code this code is much more robust
>> as it allows information to flow up from the leaves and down
>> from the trunk making the order in which mounts are encountered
>> in the umount propgation tree irrelevant.
>
> Eric,
> 	I think we can accomplish what you want in a much simpler way.
>        	Would the patch below; UNTESTED BUT COMPILED, resolve your
> 	issue?

The reason I came up with an algorithm where the information flows
both directions is that especially in the case of umount -l
but even in some rare cases of a simple umount, the ordering
of the mount propagation tree can result in parent mounts being
visited before the child mounts.

This case shows in in the case of a mount or a set of mounts
being mounted below itself.

So no.   Irregardless of tucked mount state we can't do this.

I see this also doesn't have the change to move mnt_change_mountpoint
into another pass.  That one is quite important from a practical
point of view as that means the way the mount tree changes in umount
is the same irrespective of the number of times a mount shows
up in the mount propagation trees.  Which is a very important
property to have for optimizing umount -l.  Which in
the worst case allows reduces umount from O(N^2+) to roughly O(N).

All of what I am doing should have not effect on an implementation of
MNT_TUCKED.

That said your code to deal with MNT_TUCKED seems reasonable.

Eric

>
> 	Its a two pass unmount. First pass marks mounts that can
> 	be unmounted, and second pass does the neccessary unlinks.
> 	It does mark TUCKED mounts, and uses that information
> 	to peel off the correct mounts. Key points are
>
> 	a) a tucked mount never entertain any unmount propagation
> 	 	on its root dentry.
>
> 	b) when the child on the root dentry of a tucked mount is
> 	   unmounted, the tucked mount is not a tucked mount anymore.
>
> 	c) if the child is a tucked mount, than its child is reparented
> 	   to the parent.
>
>
> Signed-off-by: "Ram Pai" <linuxram@xxxxxxxxxx>
>
> fs/namespace.c        |    4 ++-
> fs/pnode.c            |   53 +++++++++++++++++++++++++++++++++++++-------------
> fs/pnode.h            |    3 ++
> include/linux/mount.h |    1 
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index cc1375ef..ff3ec90 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2050,8 +2050,10 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>  		hlist_del_init(&child->mnt_hash);
>  		q = __lookup_mnt(&child->mnt_parent->mnt,
>  				 child->mnt_mountpoint);
> -		if (q)
> +		if (q) {
>  			mnt_change_mountpoint(child, smp, q);
> +			SET_MNT_TUCKED(child);
> +		}
>  		commit_tree(child);
>  	}
>  	put_mountpoint(smp);
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 5bc7896..b44a544 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -448,31 +448,58 @@ static void __propagate_umount(struct mount *mnt)
>  
>  	for (m = propagation_next(parent, parent); m;
>  			m = propagation_next(m, parent)) {
> -		struct mount *topper;
> -		struct mount *child = __lookup_mnt(&m->mnt,
> -						mnt->mnt_mountpoint);
> -		/*
> -		 * umount the child only if the child has no children
> -		 * and the child is marked safe to unmount.
> +		struct mount *topper, *child;
> +
> +		/* Tucked mount must drop umount propagation events on
> +		 * its **root dentry**.
> +		 * The tucked mount did not exist when that child came
> +		 * into existence. It never received that mount propagation.
> +		 * Hence it should never entertain the umount propagation
> +		 * aswell.
>  		 */
> +		if (IS_MNT_TUCKED(m) && list_is_singular(&mnt->mnt_mounts))
> +			continue;
> +
> +
> +		child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint);
> +
>  		if (!child || !IS_MNT_MARKED(child))
>  			continue;
> +
>  		CLEAR_MNT_MARK(child);
>  
> -		/* If there is exactly one mount covering all of child
> -		 * replace child with that mount.
> -		 */
> -		topper = find_topper(child);
> -		if (topper)
> -			mnt_change_mountpoint(child->mnt_parent, child->mnt_mp,
> -					      topper);
> +		if (IS_MNT_TUCKED(child) &&
> +			(list_is_singular(&child->mnt_mounts))) {
> +			topper = find_topper(child);
> +			if (topper) {
> +				mnt_change_mountpoint(child->mnt_parent,
> +					child->mnt_mp, topper);
> +				CLEAR_MNT_TUCKED(child); /*lets be precise*/
> +			}
> +		}
>  
>  		if (list_empty(&child->mnt_mounts)) {
>  			list_del_init(&child->mnt_child);
>  			child->mnt.mnt_flags |= MNT_UMOUNT;
>  			list_move_tail(&child->mnt_list, &mnt->mnt_list);
>  		}
> +#if 0
> +	       	else {
> +			mntput(child); /* mark it for deletion. It will
> +				       	  be deleted whenever it looses
> +					  all its remaining references.
> +					  TODO: some more thought
> +					  needed, please validate */
> +		}
> +#endif
>  	}
> +
> +	/*
> +	 * This explicit umount operation is exposing the parent.
> +	 * In case the parent was a 'tucked' mount, it cannot be so
> +	 * anymore.
> +	 */
> +	CLEAR_MNT_TUCKED(parent);
>  }
>  
>  /*
> diff --git a/fs/pnode.h b/fs/pnode.h
> index dc87e65..9ebd1a8 100644
> --- a/fs/pnode.h
> +++ b/fs/pnode.h
> @@ -18,8 +18,11 @@
>  #define IS_MNT_UNBINDABLE(m) ((m)->mnt.mnt_flags & MNT_UNBINDABLE)
>  #define IS_MNT_MARKED(m) ((m)->mnt.mnt_flags & MNT_MARKED)
>  #define SET_MNT_MARK(m) ((m)->mnt.mnt_flags |= MNT_MARKED)
> +#define SET_MNT_TUCKED(m) ((m)->mnt.mnt_flags |= MNT_TUCKED)
>  #define CLEAR_MNT_MARK(m) ((m)->mnt.mnt_flags &= ~MNT_MARKED)
> +#define CLEAR_MNT_TUCKED(m) ((m)->mnt.mnt_flags &= ~MNT_TUCKED)
>  #define IS_MNT_LOCKED(m) ((m)->mnt.mnt_flags & MNT_LOCKED)
> +#define IS_MNT_TUCKED(m) ((m)->mnt.mnt_flags & MNT_TUCKED)
>  
>  #define CL_EXPIRE    		0x01
>  #define CL_SLAVE     		0x02
> diff --git a/include/linux/mount.h b/include/linux/mount.h
> index 8e0352a..41674e7 100644
> --- a/include/linux/mount.h
> +++ b/include/linux/mount.h
> @@ -62,6 +62,7 @@
>  #define MNT_SYNC_UMOUNT		0x2000000
>  #define MNT_MARKED		0x4000000
>  #define MNT_UMOUNT		0x8000000
> +#define MNT_TUCKED		0x10000000
>  
>  struct vfsmount {
>  	struct dentry *mnt_root;	/* root of the mounted tree */



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux