On Tue, Oct 25, 2016 at 04:45:44PM -0500, Eric W. Biederman wrote: > Andrei Vagin <avagin@xxxxxxxxxxxxx> writes: > > > > From 8e0f45c0272aa1f789d1657a0acc98c58919dcc3 Mon Sep 17 00:00:00 2001 > > From: Andrei Vagin <avagin@xxxxxxxxxx> > > Date: Tue, 25 Oct 2016 13:57:31 -0700 > > Subject: [PATCH] mount: skip all mounts from a shared group if one is marked > > > > If we meet a marked mount, it means that all mounts from > > its group have been already revised. > > > > Signed-off-by: Andrei Vagin <avagin@xxxxxxxxxx> > > --- > > fs/pnode.c | 18 +++++++++++++++--- > > 1 file changed, 15 insertions(+), 3 deletions(-) > > > > diff --git a/fs/pnode.c b/fs/pnode.c > > index 8fd1a3f..ebb7134 100644 > > --- a/fs/pnode.c > > +++ b/fs/pnode.c > > @@ -426,10 +426,16 @@ static struct mount *propagation_visit_child(struct mount *last_child, > > if (child && !IS_MNT_MARKED(child)) > > return child; > > > > - if (!child) > > + if (!child) { > > m = propagation_next(m, origin); > > - else > > + } else { > > + if (IS_MNT_MARKED(child)) { > > + if (m->mnt_group_id == origin->mnt_group_id) > > + return NULL; > > + m = m->mnt_master; > > + } > > m = propagation_next_sib(m, origin); > > + } > > } > > return NULL; > > } > > @@ -456,8 +462,14 @@ static struct mount *propagation_revisit_child(struct mount *last_child, > > > > if (!child) > > m = propagation_next(m, origin); > > - else > > + else { > > + if (!IS_MNT_MARKED(child)) { > > + if (m->mnt_group_id == origin->mnt_group_id) > > + return NULL; > > + m = m->mnt_master; > > + } > > m = propagation_next_sib(m, origin); > > + } > > } > > return NULL; > > } > > That is certainly interesting. The problem is that the reason we were > going slow is that there were in fact mounts that had not been traversed > in the share group. > > And in fact the entire idea of visiting a vfsmount mountpoint pair > exactly once is wrong in the face of shadow mounts. For a vfsmount > mountpoint pair that has shadow mounts the number of shadow mounts needs > to be descreased by one each time the propgation tree is traversed > during unmount. Which means that as far as I can see we have to kill > shadow mounts to correctly optimize this code. Once shadow mounts are > gone I don't know of a case where need your optimization. I am not sure that now shadow mounts are worked as you described here. start_umount_propagation() doesn't remove a mount from mnt_hash, so in a second time we will look up the same mount again. Look at this script: [root@fc24 mounts]# cat ./opus02.sh set -e mkdir -p /mnt mount -t tmpfs zdtm /mnt mkdir -p /mnt/A/a mkdir -p /mnt/B/a mount --bind --make-shared /mnt/A /mnt/A mount --bind /mnt/A /mnt/B mount --bind /mnt/A/a /mnt/A/a mount --bind /mnt/A/a /mnt/A/a umount -l /mnt/A cat /proc/self/mountinfo | grep zdtm [root@fc24 mounts]# unshare --propagation private -m ./opus02.sh 159 121 0:46 / /mnt rw,relatime - tmpfs zdtm rw 162 159 0:46 /A /mnt/B rw,relatime shared:67 - tmpfs zdtm rw 167 162 0:46 /A/a /mnt/B/a rw,relatime shared:67 - tmpfs zdtm rw We mount nothing into /mnt/B, but when we umount everything from A, we still have something in B. Thanks, Andrei > > I am busily verifying my patch to kill shadow mounts but the following > patch is the minimal version. As far as I can see propagate_one > is the only place we create shadow mounts, and holding the > namespace_lock over attach_recursive_mnt, propagate_mnt, and > propgate_one is sufficient for that __lookup_mnt to be competely safe. > > diff --git a/fs/pnode.c b/fs/pnode.c > index 234a9ac49958..b14119b370d4 100644 > --- a/fs/pnode.c > +++ b/fs/pnode.c > @@ -217,6 +217,9 @@ static int propagate_one(struct mount *m) > /* skip if mountpoint isn't covered by it */ > if (!is_subdir(mp->m_dentry, m->mnt.mnt_root)) > return 0; > + /* skip if mountpoint already has a mount on it */ > + if (__lookup_mnt(&m->mnt, mp->m_dentry)) > + return 0; > if (peers(m, last_dest)) { > type = CL_MAKE_SHARED; > } else { > > If you run with that patch you will see that there are go faster stripes. > > Eric > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html