Re: [PATCH v5] mnt: Tuck mounts under others instead of creating shadow/side mounts.

Ram Pai <linuxram@xxxxxxxxxx> · Fri, 3 Feb 2017 12:28:14 -0800

On Sat, Feb 04, 2017 at 07:26:20AM +1300, Eric W. Biederman wrote:
> Ram Pai <linuxram@xxxxxxxxxx> writes:
> 
> > On Fri, Feb 03, 2017 at 11:54:21PM +1300, Eric W. Biederman wrote:
> >> ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:
> >> 
> >> > Ram Pai <linuxram@xxxxxxxxxx> writes:
> >> >
> >> >> On Sat, Jan 21, 2017 at 05:15:29PM +1300, Eric W. Biederman wrote:
> >> >>> Ram Pai <linuxram@xxxxxxxxxx> writes:
> >> >>> 
> >> >>> >> @@ -359,12 +373,24 @@ int propagate_mount_busy(struct mount *mnt, int refcnt)
> >> >>> >> 
> >> >>> >>  	for (m = propagation_next(parent, parent); m;
> >> >>> >>  	     		m = propagation_next(m, parent)) {
> >> >>> >> -		child = __lookup_mnt_last(&m->mnt, mnt->mnt_mountpoint);
> >> >>> >> -		if (child && list_empty(&child->mnt_mounts) &&
> >> >>> >> -		    (ret = do_refcount_check(child, 1)))
> >> >>> >> -			break;
> >> >>> >> +		int count = 1;
> >> >>> >> +		child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint);
> >> >>> >> +		if (!child)
> >> >>> >> +			continue;
> >> >>> >> +
> >> >>> >> +		/* Is there exactly one mount on the child that covers
> >> >>> >> +		 * it completely whose reference should be ignored?
> >> >>> >> +		 */
> >> >>> >> +		topper = find_topper(child);
> >> >>> >
> >> >>> > This is tricky. I understand it is trying to identify the case where a
> >> >>> > mount got tucked-in because of propagation.  But this will not
> >> >>> > distinguish the case where a mount got over-mounted genuinely, not because of
> >> >>> > propagation, but because of explicit user action.
> >> >>> >
> >> >>> >
> >> >>> > example:
> >> >>> >
> >> >>> > case 1: (explicit user action)
> >> >>> > 	B is a slave of A
> >> >>> > 	mount something on A/a , it will propagate to B/a
> >> >>> > 	and than mount something on B/a
> >> >>> >
> >> >>> > case 2: (tucked mount)
> >> >>> > 	B is a slave of A
> >> >>> > 	mount something on B/a
> >> >>> > 	and than mount something on A/a
> >> >>> >
> >> >>> > Both case 1 and case 2 lead to the same mount configuration.
> >> >>> >
> >> >>> >
> >> >>> > 	  however 'umount A/a' in case 1 should fail.
> >> >>> > 	  and 'umount A/a' in case 2 should pass.
> >> >>> >
> >> >>> > Right? in other words, umounts of 'tucked mounts' should pass(case 2).
> >> >>> > 	whereas umounts of mounts on which overmounts exist should
> >> >>> > 		fail.(case 1)
> >> >>> 
> >> >>> Looking at your example.  I agree that case 1 will fail today.
> >> >>
> >> >> And should continue to fail. right? Your semantics change will pass it.
> >> >
> >> > I don't see why it should continue to fail.
> >> >
> >> >>> However my actual expectation would be for both mount configurations
> >> >>> to behave the same.  In both cases something has been explicitly mounted
> >> >>> on B/a and something has propagated to B/a.  In both cases the mount
> >> >>> on top is what was explicitly mounted, and the mount below is what was
> >> >>> propagated to B/a.
> >> >>> 
> >> >>> I don't see why the order of operations should matter.
> >> >>
> >> >> One of the subtle expectation is reversibility.
> >> >>
> >> >> Mount followed immediately by unmount has always passed and that is the
> >> >> standard expectation always. Your proposed code will ensure that.
> >> >>
> >> >> However there is one other subtle expectaton.
> >> >>
> >> >> A mount cannot disappear if a user has explicitly mounted on top of it.
> >> >>
> >> >> your proposed code will not meet that expectation. 
> >> >>
> >> >> In other words, these two expectations make it behave differently even
> >> >> when; arguably, they feel like the same configuration.
> >> >
> >> > I am not seeing that.
> >> >
> >> >
> >> >
> >> >>> 
> >> >>> > maybe we need a flag to identify tucked mounts?
> >> >>> 
> >> >>> To preserve our exact current semantics yes.
> >> >>> 
> >> >>> The mount configurations that are delibearately constructed that I am
> >> >>> aware of are comparatively simple.  I don't think anyone has even taken
> >> >>> advantage of the shadow/side mounts at this point.  I made a reasonable
> >> >>> effort to find out and no one was even aware they existed.  Much less
> >> >>> what they were.  And certainly no one I talked to could find code that
> >> >>> used them.
> >> >>
> >> >> But someday; even if its after a decade, someone ;) will
> >> >> stumble into this semantics and wonder 'why?'. Its better to get it right
> >> >> sooner. Sorry, I am blaming myself; for keeping some of the problems
> >> >> open thinking no one will bump into them.
> >> >
> >> > Oh definitely.  If we have people ready to talk it through I am happy to
> >> > dot as many i's and cross as many t's as we productively can.
> >> >
> >> > I was just pointing out that I don't have any reason to expect that any
> >> > one depends on the subtle details of the implementation today so we
> >> > still have some wiggle room to fix them.  Even if they are visible to
> >> > user space.
> >> 
> >> So I haven't seen a reply, and we are getting awfully close to the merge
> >> window.  Is there anything concrete we can do to ease concerns?
> >> 
> >> Right now I am thinking my last version of the patch is the likely the
> >> best we have time and energy to manage and it would be good to merge it
> >> before the code bit rots.
> >
> > I was waiting for some other opinions on the behavior, since I
> > continue to think that 'one should not be able to unmount mounts on
> > which a user has explicitly mounted upon'. I am happy to be overruled,
> > since your patch significantly improves the rest of the semantics.
> >
> > Viro?
> 
> Ram Pai, just to be clear you were hoping to add the logic below to my patch?

Yes. the behavior of your patch below is what I was proposing.

> 
> My objections to the snippet below are:
> 
> - It makes it hard for the CRIU folks (yet more state they have to find
>   and restore).

true. unfortunately one more subtle detail to be aware off.

> 
> - It feels subjectively worse to me.
> 
> - We already have cases where mounts are unmounted transparently (umount on rmdir).

sorry. i am not aware of this case. some details will help.

> 
> - Al Viro claims that the side/shadow mounts are ordinary mounts and
>   maintaining this extra logic that remembers if we tucked one mount
>   under another seems to make this them less ordinary.

I tend to argue that they are a bit more than ordinary, for they have the
ability to tuck.

> 
> - The symmetry for unmounting exists for a tucked mount.  We can unmount
>   it via propagation or we can unmount the mount above it, and then we
>   can unmount the new underlying mount.

this is fine with me.

>   So I don't see why we don't
>   want symmetry in the other case just because we mounted on top of
>   the mount and rather than had the mount tucked under us.

A tucked mount should be un-tuckable. I agree.  But a non-tucked mount
cannot pretend to be tucked and this is where I disagree.

> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 8bfad42c1ccf..8b00e0548438 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2047,8 +2047,10 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>  		hlist_del_init(&child->mnt_hash);
>  		q = __lookup_mnt(&child->mnt_parent->mnt,
>  				 child->mnt_mountpoint);
> -		if (q)
> +		if (q) {
>  			mnt_change_mountpoint(child, smp, q);
> +			child->mnt.mnt_flags |= MNT_TUCKED;
> +		}
>  		commit_tree(child);
>  	}
>  	put_mountpoint(smp);
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 5bc7896d122a..e2a6ac68feb9 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -327,6 +327,9 @@ static struct mount *find_topper(struct mount *mnt)
>  	/* If there is exactly one mount covering mnt completely return it. */
>  	struct mount *child;
> 
> +	if (!(mnt->mnt.mnt_flags & MNT_TUCKED))
> +		return NULL;
> +	
>  	if (!list_is_singular(&mnt->mnt_mounts))
>  		return NULL;
> 
> diff --git a/include/linux/mount.h b/include/linux/mount.h
> index 8e0352af06b7..25ca398b19b3 100644
> --- a/include/linux/mount.h
> +++ b/include/linux/mount.h
> @@ -52,6 +52,7 @@ struct mnt_namespace;
> 
>  #define MNT_INTERNAL	0x4000
> 
> +#define MNT_TUCKED		0x020000
>  #define MNT_LOCK_ATIME		0x040000
>  #define MNT_LOCK_NOEXEC		0x080000
>  #define MNT_LOCK_NOSUID		0x100000
> 
> Eric

-- 
Ram Pai