On Thu, Apr 9, 2020 at 8:30 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Apr 09, 2020 at 05:54:46PM +0100, Al Viro wrote: > > On Thu, Apr 09, 2020 at 05:50:48PM +0100, Al Viro wrote: > > > On Thu, Apr 09, 2020 at 04:16:19PM +0200, Miklos Szeredi wrote: > > > > Solve this by adding a cursor entry for each open instance. Taking the > > > > global namespace_sem for write seems excessive, since we are only dealing > > > > with a per-namespace list. Instead add a per-namespace spinlock and use > > > > that together with namespace_sem taken for read to protect against > > > > concurrent modification of the mount list. This may reduce parallelism of > > > > is_local_mountpoint(), but it's hardly a big contention point. We could > > > > also use RCU freeing of cursors to make traversal not need additional > > > > locks, if that turns out to be neceesary. > > > > > > Umm... That can do more than reduction of parallelism - longer lists take > > > longer to scan and moving cursors dirties cachelines in a bunch of struct > > > mount instances. And I'm not convinced that your locking in m_next() is > > > correct. > > > > > > What's to stop umount_tree() from removing the next entry from the list > > > just as your m_next() tries to move the cursor? I don't see any common > > > locks for those two... > > > > Ah, you still have namespace_sem taken (shared) by m_start(). Nevermind > > that one, then... Let me get through mnt_list users and see if I can > > catch anything. > > OK... Locking is safe, but it's not obvious. And your changes do make it > scarier. There are several kinds of lists that can be threaded through > ->mnt_list and your code depends upon never having those suckers appear > in e.g. anon namespace ->list. It is true (AFAICS), but... See analysis below. > Another fun question is ns->mounts rules - it used to be "the number of > entries in ns->list", now it's "the number of non-cursor entries there". > Incidentally, we might have a problem with that logics wrt count_mount(). Nope, count_mount() iterates through the mount tree, not through mnt_ns->list. > Sigh... The damn thing has grown much too convoluted over years ;-/ > > I'm still not happy with that patch; at the very least it needs a lot more > detailed analysis to go along with it. Functions touching mnt_list: In pnode.c: umount_one: umount_list: propagate_umount: both of the above are indirectly called from this. The only caller is umount_tree(), which has lots of different call paths, but in each one has namespace_sem taken for write: do_move_mount attach_recursive_mnt umount_tree do_loopback graft_tree attach_recursive_mnt umount_tree do_new_mount_fc do_add_mount graft_tree attach_recursive_mnt umount_tree finish_automount do_add_mount graft_tree attach_recursive_mnt umount_tree do_umount shrink_submounts umount_tree namespace.c: __is_local_mountpoint: takes namespace_sem for read commit_tree: has namespace_sem for write (only caller being attach_recursive_mnt, see above for call paths). m_start: m_next: m_show: all have namespace_sem for read umount_tree: all callers have namespace_sem for write (se above for call paths) do_umount: has namespace_sem for write copy_tree: all members are newly allocated iterate_mounts: operates on private copy built by collect_mounts() open_detached_copy: takes namespace_sem for write copy_mnt_ns: takes namespace_sem for write mount_subtree: adds onto a newly allocated mnt_namespace sys_fsmount: ditto init_mount_tree: ditto mnt_already_visible: takes namespace_sem for read Patch adds ns_lock locking to all places that only have namespace_sem for read. So everyone is still excluded: those taking namespace_sem for write against everyone else obviously, and those taking namespace_sem for read because they take ns_lock. Thanks, Miklos