Re: [REVIEW][PATCH] mnt: Tuck mounts under others instead of creating shadow/side mounts.

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Sat, 7 Jan 2017 05:06:44 +0000

On Thu, Jan 05, 2017 at 10:04:14AM +1300, Eric W. Biederman wrote:

>  - attach_shadows to be renamed __attach_mnt and the it's shadow
>    handling to be removed.

Er...  s/the it's/its/, presumably?  Or am I misparsing that?

> v2: Updated to mnt_change_mountpoint to not call dput or mntput
> and instead to decrement the counts directly.  It is guaranteed
> that there will be other references when mnt_change_mountpoint is
> called so this is safe.

> +void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp, struct mount *mnt)
>  {

Too generic name, IMO, and I really wonder if "mount" (== interpose) and
"umount" (== excise?) cases would be better off separately.

> +	struct mountpoint *old_mp = mnt->mnt_mp;
> +	struct dentry *old_mountpoint = mnt->mnt_mountpoint;
> +	struct mount *old_parent = mnt->mnt_parent;
> +
> +	list_del_init(&mnt->mnt_child);
> +	hlist_del_init(&mnt->mnt_mp_list);
> +	hlist_del_init_rcu(&mnt->mnt_hash);
> +
> +	attach_mnt(mnt, parent, mp);
> +
> +	put_mountpoint(old_mp);

> +	 *
> +	 * During mounting, another mount will continue to use the old
> +	 * mountpoint and during unmounting, the old mountpoint will
> +	 * continue to exist until namespace_unlock which happens well
> +	 * after mnt_change_mountpoint.
> +	 */

Umm...  AFAICS, in the former case "another mount" is simply parent, right?

> +	spin_lock(&old_mountpoint->d_lock);
> +	old_mountpoint->d_lockref.count--;
> +	spin_unlock(&old_mountpoint->d_lock);
> +	mnt_add_count(old_parent, -1);

> +		if (child->mnt.mnt_root == smp->m_dentry) {

Explain, please.  In which case is that condition _not_ satisfied, and
what should happen i

> +			struct mount *q;
> +			q = __lookup_mnt(&child->mnt_parent->mnt,
> +					 child->mnt_mountpoint);
> +			if (q)
> +				mnt_change_mountpoint(child, smp, q);

>  	unlock_mount_hash();
> +	put_mountpoint(smp);

Wrong order...

>  	ns->pending_mounts = 0;
> +	put_mountpoint(smp);

... and worse yet here.

>  static inline int do_refcount_check(struct mount *mnt, int count)
>  {
> +	struct mount *topper = __lookup_mnt(&mnt->mnt, mnt->mnt.mnt_root);
> +	if (topper)
> +		count++;
>  	return mnt_get_count(mnt) > count;
>  }

> @@ -359,7 +362,7 @@ int propagate_mount_busy(struct mount *mnt, int refcnt)
>  
>  	for (m = propagation_next(parent, parent); m;
>  	     		m = propagation_next(m, parent)) {
> -		child = __lookup_mnt_last(&m->mnt, mnt->mnt_mountpoint);
> +		child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint);
>  		if (child && list_empty(&child->mnt_mounts) &&
>  		    (ret = do_refcount_check(child, 1)))
>  			break;

Er...  You do realize that you can end up with more that one such
propagation, right?  IOW, there might be more than one thing slipped in.

> +		if (!IS_MNT_LOCKED(child) || IS_MNT_MARKED(m)) {
>  			SET_MNT_MARK(child);

Reread the condition, please...  And yes, I realize that original is
also rather odd; at a guess it was meant to be !(, not (!, but that's
just a guess - it's your code, IIRC.

> @@ -420,8 +425,8 @@ static void __propagate_umount(struct mount *mnt)
>  
>  	for (m = propagation_next(parent, parent); m;
>  			m = propagation_next(m, parent)) {
> -
> -		struct mount *child = __lookup_mnt_last(&m->mnt,
> +		struct mount *topper;
> +		struct mount *child = __lookup_mnt(&m->mnt,
>  						mnt->mnt_mountpoint);
>  		/*
>  		 * umount the child only if the child has no children
> @@ -430,6 +435,16 @@ static void __propagate_umount(struct mount *mnt)
>  		if (!child || !IS_MNT_MARKED(child))
>  			continue;
>  		CLEAR_MNT_MARK(child);
> +
> +		/* If there is exactly one mount covering all of child
> +		 * replace child with that mount.
> +		 */
> +		topper = __lookup_mnt(&child->mnt, child->mnt.mnt_root);
> +		if (topper &&

> +		    (child->mnt_mounts.next == &topper->mnt_child) &&
> +		    (topper->mnt_child.next == &child->mnt_mounts))

Weird way to spell child->mnt_mounts.next == child->mnt_mounts.prev, that...
Or, perhaps, the entire thing ought to be
		if (list_is_singular(&child->mnt_mounts)) {
			topper = list_first_entry(&child->mnt_mounts,
						  struct mount, mnt_child);
			if (topper->mnt_parent == child &&
			    topped->mnt_mountpoint == child->mnt.mnt_root)

to avoid hash lookups.

> +			mnt_change_mountpoint(child->mnt_parent, child->mnt_mp, topper);

FWIW, the my main worry here is your handling of the umount.  For
example, what happens if
	* something is mounted on A (m1)
	* something else is mounted on A/bar (m2)
	* D is a slave of C
	* something has been mounted on D/foo (n)
	* you do mount --rbind A C/foo (m1' on C/foo, m2' on m1/bar,
					m1'' interposed on D/foo under n,
					m2'' on m1''/bar,
					m1'' slave of m1', m2'' slave of m2)
	* you make C/foo and C/foo/bar private (m1'' and m2'' are not getting
					propagation from m1' and m2' anymore)
	* you umount C/foo/bar		(m2' is unmounted)
	* you umount C/foo
m1' gets unmounted, all right, but what of m1''?  D is a slave of C, so we
get propagation of umount from C/foo to D/foo; m1'' still has m2'' attached
to it.  AFAICS, your logics will happily slip m1'' from under n (assuming
that n itself is not busy), and leak both m1'' and m2''.

OTOH, the case of multiple slip-under (Z is slave of Y, which is a slave of X,
mount on Z, then mount of Y, then mount on X) the check for being busy would
do very odd things.

Something's fishy on the umount side...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html