Re: [PATCH v2] proc/mounts: add cursor

Miklos Szeredi <miklos@xxxxxxxxxx> · Thu, 9 Apr 2020 21:36:35 +0200

On Thu, Apr 9, 2020 at 8:30 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Apr 09, 2020 at 05:54:46PM +0100, Al Viro wrote:
> > On Thu, Apr 09, 2020 at 05:50:48PM +0100, Al Viro wrote:
> > > On Thu, Apr 09, 2020 at 04:16:19PM +0200, Miklos Szeredi wrote:
> > > > Solve this by adding a cursor entry for each open instance.  Taking the
> > > > global namespace_sem for write seems excessive, since we are only dealing
> > > > with a per-namespace list.  Instead add a per-namespace spinlock and use
> > > > that together with namespace_sem taken for read to protect against
> > > > concurrent modification of the mount list.  This may reduce parallelism of
> > > > is_local_mountpoint(), but it's hardly a big contention point.  We could
> > > > also use RCU freeing of cursors to make traversal not need additional
> > > > locks, if that turns out to be neceesary.
> > >
> > > Umm...  That can do more than reduction of parallelism - longer lists take
> > > longer to scan and moving cursors dirties cachelines in a bunch of struct
> > > mount instances.  And I'm not convinced that your locking in m_next() is
> > > correct.
> > >
> > > What's to stop umount_tree() from removing the next entry from the list
> > > just as your m_next() tries to move the cursor?  I don't see any common
> > > locks for those two...
> >
> > Ah, you still have namespace_sem taken (shared) by m_start().  Nevermind
> > that one, then...  Let me get through mnt_list users and see if I can
> > catch anything.
>
> OK...  Locking is safe, but it's not obvious.  And your changes do make it
> scarier.   There are several kinds of lists that can be threaded through
> ->mnt_list and your code depends upon never having those suckers appear
> in e.g. anon namespace ->list.  It is true (AFAICS), but...

See analysis below.

> Another fun question is ns->mounts rules - it used to be "the number of
> entries in ns->list", now it's "the number of non-cursor entries there".
> Incidentally, we might have a problem with that logics wrt count_mount().

Nope, count_mount() iterates through the mount tree, not through mnt_ns->list.

> Sigh...  The damn thing has grown much too convoluted over years ;-/
>
> I'm still not happy with that patch; at the very least it needs a lot more
> detailed analysis to go along with it.

Functions touching mnt_list:

In pnode.c:

umount_one:
umount_list:
propagate_umount: both of the above are indirectly called from this.
The only caller is umount_tree(), which has lots of different call
paths, but in each one has namespace_sem taken for write:

do_move_mount
  attach_recursive_mnt
    umount_tree

do_loopback
  graft_tree
    attach_recursive_mnt
      umount_tree

do_new_mount_fc
  do_add_mount
    graft_tree
      attach_recursive_mnt
        umount_tree

finish_automount
  do_add_mount
    graft_tree
      attach_recursive_mnt
        umount_tree

do_umount
  shrink_submounts
    umount_tree

namespace.c:

__is_local_mountpoint: takes namespace_sem for read

commit_tree: has namespace_sem for write (only caller being
attach_recursive_mnt, see above for call paths).

m_start:
m_next:
m_show: all have namespace_sem for read

umount_tree: all callers have namespace_sem for write (se above for call paths)

do_umount: has namespace_sem for write

copy_tree: all members are newly allocated

iterate_mounts: operates on private copy built by collect_mounts()

open_detached_copy: takes namespace_sem for write

copy_mnt_ns: takes namespace_sem for write

mount_subtree: adds onto a newly allocated mnt_namespace

sys_fsmount: ditto

init_mount_tree: ditto

mnt_already_visible: takes namespace_sem for read

Patch adds ns_lock locking to all places that only have namespace_sem
for read.  So everyone is still excluded:  those taking namespace_sem
for write against everyone else obviously, and those taking
namespace_sem for read because they take ns_lock.

Thanks,
Miklos