Re: [PATCH v5] mnt: Tuck mounts under others instead of creating shadow/side mounts.

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Tue, 24 Jan 2017 13:16:01 +1300

Ram Pai <linuxram@xxxxxxxxxx> writes:

> On Sat, Jan 21, 2017 at 05:15:29PM +1300, Eric W. Biederman wrote:
>> Ram Pai <linuxram@xxxxxxxxxx> writes:
>> 
>> >> @@ -359,12 +373,24 @@ int propagate_mount_busy(struct mount *mnt, int refcnt)
>> >> 
>> >>  	for (m = propagation_next(parent, parent); m;
>> >>  	     		m = propagation_next(m, parent)) {
>> >> -		child = __lookup_mnt_last(&m->mnt, mnt->mnt_mountpoint);
>> >> -		if (child && list_empty(&child->mnt_mounts) &&
>> >> -		    (ret = do_refcount_check(child, 1)))
>> >> -			break;
>> >> +		int count = 1;
>> >> +		child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint);
>> >> +		if (!child)
>> >> +			continue;
>> >> +
>> >> +		/* Is there exactly one mount on the child that covers
>> >> +		 * it completely whose reference should be ignored?
>> >> +		 */
>> >> +		topper = find_topper(child);
>> >
>> > This is tricky. I understand it is trying to identify the case where a
>> > mount got tucked-in because of propagation.  But this will not
>> > distinguish the case where a mount got over-mounted genuinely, not because of
>> > propagation, but because of explicit user action.
>> >
>> >
>> > example:
>> >
>> > case 1: (explicit user action)
>> > 	B is a slave of A
>> > 	mount something on A/a , it will propagate to B/a
>> > 	and than mount something on B/a
>> >
>> > case 2: (tucked mount)
>> > 	B is a slave of A
>> > 	mount something on B/a
>> > 	and than mount something on A/a
>> >
>> > Both case 1 and case 2 lead to the same mount configuration.
>> >
>> >
>> > 	  however 'umount A/a' in case 1 should fail.
>> > 	  and 'umount A/a' in case 2 should pass.
>> >
>> > Right? in other words, umounts of 'tucked mounts' should pass(case 2).
>> > 	whereas umounts of mounts on which overmounts exist should
>> > 		fail.(case 1)
>> 
>> Looking at your example.  I agree that case 1 will fail today.
>
> And should continue to fail. right? Your semantics change will pass it.

I don't see why it should continue to fail.

>> However my actual expectation would be for both mount configurations
>> to behave the same.  In both cases something has been explicitly mounted
>> on B/a and something has propagated to B/a.  In both cases the mount
>> on top is what was explicitly mounted, and the mount below is what was
>> propagated to B/a.
>> 
>> I don't see why the order of operations should matter.
>
> One of the subtle expectation is reversibility.
>
> Mount followed immediately by unmount has always passed and that is the
> standard expectation always. Your proposed code will ensure that.
>
> However there is one other subtle expectaton.
>
> A mount cannot disappear if a user has explicitly mounted on top of it.
>
> your proposed code will not meet that expectation. 
>
> In other words, these two expectations make it behave differently even
> when; arguably, they feel like the same configuration.

I am not seeing that.

>> 
>> > maybe we need a flag to identify tucked mounts?
>> 
>> To preserve our exact current semantics yes.
>> 
>> The mount configurations that are delibearately constructed that I am
>> aware of are comparatively simple.  I don't think anyone has even taken
>> advantage of the shadow/side mounts at this point.  I made a reasonable
>> effort to find out and no one was even aware they existed.  Much less
>> what they were.  And certainly no one I talked to could find code that
>> used them.
>
> But someday; even if its after a decade, someone ;) will
> stumble into this semantics and wonder 'why?'. Its better to get it right
> sooner. Sorry, I am blaming myself; for keeping some of the problems
> open thinking no one will bump into them.

Oh definitely.  If we have people ready to talk it through I am happy to
dot as many i's and cross as many t's as we productively can.

I was just pointing out that I don't have any reason to expect that any
one depends on the subtle details of the implementation today so we
still have some wiggle room to fix them.  Even if they are visible to
user space.

Then I see Andrei Vagin's patch for checkpoint/restore and the mount
namespace and I start suspecting that will be the point where all of the
subtle details get locked in stone because checkpont/restore will have
to preserve every possible configuration of mount namespaces.

My main concern at this point is to get the code to a point where a
malicious user in a user namespace can not cause problems for root
in the primary mount namespace.  Even if root did open himself for all
kinds of trouble by running "mount --make-rshared /".  As that is
essentially required to use mount propagation at all.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html