Re: [RFC][PATCH v2] mount: In propagate_umount handle overlapping mount propagation trees

Andrei Vagin <avagin@xxxxxxxxxxxxx> · Mon, 31 Oct 2016 23:14:23 -0700

On Tue, Oct 25, 2016 at 04:45:44PM -0500, Eric W. Biederman wrote:
> Andrei Vagin <avagin@xxxxxxxxxxxxx> writes:
> >
> > From 8e0f45c0272aa1f789d1657a0acc98c58919dcc3 Mon Sep 17 00:00:00 2001
> > From: Andrei Vagin <avagin@xxxxxxxxxx>
> > Date: Tue, 25 Oct 2016 13:57:31 -0700
> > Subject: [PATCH] mount: skip all mounts from a shared group if one is marked
> >
> > If we meet a marked mount, it means that all mounts from
> > its group have been already revised.
> >
> > Signed-off-by: Andrei Vagin <avagin@xxxxxxxxxx>
> > ---
> >  fs/pnode.c | 18 +++++++++++++++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/pnode.c b/fs/pnode.c
> > index 8fd1a3f..ebb7134 100644
> > --- a/fs/pnode.c
> > +++ b/fs/pnode.c
> > @@ -426,10 +426,16 @@ static struct mount *propagation_visit_child(struct mount *last_child,
> >  		if (child && !IS_MNT_MARKED(child))
> >  			return child;
> >  
> > -		if (!child)
> > +		if (!child) {
> >  			m = propagation_next(m, origin);
> > -		else
> > +		} else {
> > +			if (IS_MNT_MARKED(child)) {
> > +				if (m->mnt_group_id == origin->mnt_group_id)
> > +					return NULL;
> > +				m = m->mnt_master;
> > +			}
> >  			m = propagation_next_sib(m, origin);
> > +		}
> >  	}
> >  	return NULL;
> >  }
> > @@ -456,8 +462,14 @@ static struct mount *propagation_revisit_child(struct mount *last_child,
> >  
> >  		if (!child)
> >  			m = propagation_next(m, origin);
> > -		else
> > +		else {
> > +			if (!IS_MNT_MARKED(child)) {
> > +				if (m->mnt_group_id == origin->mnt_group_id)
> > +					return NULL;
> > +				m = m->mnt_master;
> > +			}
> >  			m = propagation_next_sib(m, origin);
> > +		}
> >  	}
> >  	return NULL;
> >  }
> 
> That is certainly interesting.  The problem is that the reason we were
> going slow is that there were in fact mounts that had not been traversed
> in the share group.
> 
> And in fact the entire idea of visiting a vfsmount mountpoint pair
> exactly once is wrong in the face of shadow mounts.  For a vfsmount
> mountpoint pair that has shadow mounts the number of shadow mounts needs
> to be descreased by one each time the propgation tree is traversed
> during unmount. Which means that as far as I can see we have to kill
> shadow mounts to correctly optimize this code.  Once shadow mounts are
> gone I don't know of a case where need your optimization.

I am not sure that now shadow mounts are worked as you described
here. start_umount_propagation() doesn't remove a mount from mnt_hash,
so in a second time we will look up the same mount again.

Look at this script:

[root@fc24 mounts]# cat ./opus02.sh
set -e
mkdir -p /mnt
mount -t tmpfs zdtm /mnt
mkdir -p /mnt/A/a
mkdir -p /mnt/B/a
mount --bind --make-shared /mnt/A /mnt/A
mount --bind /mnt/A /mnt/B
mount --bind /mnt/A/a /mnt/A/a
mount --bind /mnt/A/a /mnt/A/a

umount -l /mnt/A
cat /proc/self/mountinfo | grep zdtm

[root@fc24 mounts]# unshare --propagation private -m ./opus02.sh
159 121 0:46 / /mnt rw,relatime - tmpfs zdtm rw
162 159 0:46 /A /mnt/B rw,relatime shared:67 - tmpfs zdtm rw
167 162 0:46 /A/a /mnt/B/a rw,relatime shared:67 - tmpfs zdtm rw

We mount nothing into /mnt/B, but when we umount everything from A, we
still have something in B.

Thanks,
Andrei
> 
> I am busily verifying my patch to kill shadow mounts but the following
> patch is the minimal version.  As far as I can see propagate_one
> is the only place we create shadow mounts, and holding the
> namespace_lock over attach_recursive_mnt, propagate_mnt, and
> propgate_one is sufficient for that __lookup_mnt to be competely safe.
> 
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 234a9ac49958..b14119b370d4 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -217,6 +217,9 @@ static int propagate_one(struct mount *m)
>         /* skip if mountpoint isn't covered by it */
>         if (!is_subdir(mp->m_dentry, m->mnt.mnt_root))
>                 return 0;
> +       /* skip if mountpoint already has a mount on it */
> +       if (__lookup_mnt(&m->mnt, mp->m_dentry))
> +               return 0;
>         if (peers(m, last_dest)) {
>                 type = CL_MAKE_SHARED;
>         } else {
> 
> If you run with that patch you will see that there are go faster stripes.
> 
> Eric
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html