Re: How to abuse RCU to scan the children of a mount?

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Sun, 8 Mar 2020 07:32:36 -0700

On Wed, Mar 04, 2020 at 07:44:51PM +0000, Al Viro wrote:
> On Wed, Mar 04, 2020 at 11:28:16AM -0800, Paul E. McKenney wrote:
> 
> > Huh.  The mount structure isn't suffering from a shortage of list_head
> > structures, is it?
> > 
> > So the following can happen, then?
> > 
> > o	The __attach_mnt() function adds a struct mount to its parent
> > 	list, but in a non-RCU manner.	Unless there is some other
> > 	safeguard, the list_add_tail() in this function needs to be
> > 	list_add_tail_rcu().
> > 
> > o	I am assuming that the various non-RCU traversals that I see,
> > 	for example, next_mnt(), are protected by lock_mount_hash().
> > 	Especially skip_mnt_tree(), which uses mnt_mounts.prev.  (I didn't
> > 	find any exceptions, but I don't claim an exhaustive search.)
> > 
> > o	The umount_tree() function's use of list_del_init() looks like
> > 	it could trap an RCU reader in the newly singular list formed
> > 	by the removal.  It appears that there are other functions that
> > 	use list_del_init() on this list, though I cannot claim any sort
> > 	of familiarity with this code.
> > 
> > 	So, do you need to add a check for child->mnt_child being in this
> > 	self-referential state within fsinfo_generic_mount_children()?
> > 
> > 	Plus list_del_init() doesn't mark its stores, though
> > 	some would argue that unmarked stores are OK in this situation.
> > 
> > o	There might be other operations in need of RCU-ification.
> > 
> > 	Maybe the list_add_tail() in umount_tree(), but it is not
> > 	immediately clear that this is adding a new element instead of
> > 	re-inserting an element already exposed to readers.
> 
> IMO all of that is a good argument *against* trying to pull any kind of RCU
> games here.  Access to these lists is assumed to be serialized on
> mount_lock spinlock component held exclusive and unless there is a very
> good reason to go for something trickier, let's not.
> 
> Hash chains are supposed to be walked under rcu_read_lock(), requiring
> to recheck the mount_lock seqcount *AND* with use of legitimize_mnt()
> if you are to try and get a reference out of that.  ->mnt_parent chains
> also can be walked in the same conditions (subject to the same
> requirements).  The policy with everything else is
> 	* get mount_lock spinlock exclusive or
> 	* get namespace_sem or
> 	* just fucking don't do it.

I have no idea what David's performance requirements are.  But yes,
of course, if locking suffices, use locking, be happy, and get on with
your life.

It looks like David would otherwise have been gathering information
while holding some sort of lock (as you say above), then releasing that
lock before actually using the gathered information.  Therefore, should
it turns out that locking is to slow does not suffice for David's use
case, and if the issues I identified above disqualify RCU (which would
be unsurprising, especially given that the above is unlikely to be an
exhaustive list), then another option is to maintain a separate data
structure that keeps the needed information.  When the mounts change,
the next lookup would grab the needed locks and regenerate this separate
data structure.  Otherwise, just do lookup in the separate data structure.

But again, if just locking and traversing the main data structure
suffices, agreed, just do that, be happy, and get on with life.

							Thanx, Paul