On Tue, Jun 13, 2017 at 06:55:30AM -0500, Eric W. Biederman wrote: > Ram Pai <linuxram@xxxxxxxxxx> writes: > > > Shared-subtree has had many cases where unmount operation would leave a lot of > > residual mounts, because it was way too difficult to remove them. Eric > > Biederman came up with the idea of tucked mount which relieved the situation > > tremendously. But it still lacks the reversibility property -- A mount > > followed by a umount should reverse the mount-tree to the exact same state as > > before the mount. > > There is a lot here so I am taking my time an digging through it all. Yes. thanks in advance for taking the time. It will take some concentrated effort to go through it. > > I have found two issues that I am going to point out before I comment > on the big picture. > > > diff --git a/fs/pnode.c b/fs/pnode.c > > index 5bc7896..7dfbe9c 100644 > > --- a/fs/pnode.c > > +++ b/fs/pnode.c > > @@ -322,19 +322,9 @@ int propagate_mnt(struct mount *dest_mnt, struct mountpoint *dest_mp, > > return ret; > > } > > > > -static struct mount *find_topper(struct mount *mnt) > > +static struct mount *find_covering_child(struct mount *m) > > { > > - /* If there is exactly one mount covering mnt completely return it. */ > > - struct mount *child; > > - > > - if (!list_is_singular(&mnt->mnt_mounts)) > > - return NULL; > > - > > - child = list_first_entry(&mnt->mnt_mounts, struct mount, mnt_child); > > - if (child->mnt_mountpoint != mnt->mnt.mnt_root) > > - return NULL; > > - > > - return child; > > + return __lookup_mnt(&m->mnt, m->mnt.mnt_root); > > } > > Unless there is some reason you can't use mnt_mounts using > __lookup_mnt is a pessimization here as __lookup_mnt can go slowly > when the has chains get long. I can depend on ->mnt_mounts. I will have to walk the children list, which can be slow. But I realize that the covering child mount will generally be towards the tail of the list, since once it covers, the only way to mount is through propagations. So I can use the reverse-list-walking heuristic and cut down on the time. > > > /* > > @@ -357,8 +347,10 @@ static inline int do_refcount_check(struct mount *mnt, int count) > > */ > > int propagate_mount_busy(struct mount *mnt, int refcnt) > > { > > - struct mount *m, *child, *topper; > > + struct mount *m, *child; > > struct mount *parent = mnt->mnt_parent; > > + bool tucks_can_ignore = !(IS_MNT_TUCK_END(parent) && > > + mnt->mnt_mountpoint == parent->mnt.mnt_root); > > > > if (mnt == parent) > > return do_refcount_check(mnt, refcnt); > > @@ -372,19 +364,19 @@ int propagate_mount_busy(struct mount *mnt, int refcnt) > > return 1; > > > > for (m = propagation_next(parent, parent); m; > > - m = propagation_next(m, parent)) { > > + m = propagation_next(m, parent)) { > > int count = 1; > > - child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint); > > - if (!child) > > - continue; > > > > - /* Is there exactly one mount on the child that covers > > - * it completely whose reference should be ignored? > > + /* > > + * read the comment in __propagate_umount() > > */ > > - topper = find_topper(child); > > - if (topper) > > - count += 1; > > - else if (!list_empty(&child->mnt_mounts)) > > + if (tucks_can_ignore && > > + IS_MNT_TUCK_END(m) && > > + (mnt->mnt_mountpoint == m->mnt.mnt_root)) > > + continue; > > + > > + child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint); > > + if (!child || !list_empty(&child->mnt_mounts)) > > continue; > > > > if (do_refcount_check(child, count)) > > Unless I am misreading this code you are skipping a refcount check that > we should be performing. I looked through it again, and I am sure we are not skipping a refcount check. We check the refcount of only those children which can be a target of the umount and have no children of their own. RP