On Wed, Mar 04, 2020 at 11:28:16AM -0800, Paul E. McKenney wrote: > Huh. The mount structure isn't suffering from a shortage of list_head > structures, is it? > > So the following can happen, then? > > o The __attach_mnt() function adds a struct mount to its parent > list, but in a non-RCU manner. Unless there is some other > safeguard, the list_add_tail() in this function needs to be > list_add_tail_rcu(). > > o I am assuming that the various non-RCU traversals that I see, > for example, next_mnt(), are protected by lock_mount_hash(). > Especially skip_mnt_tree(), which uses mnt_mounts.prev. (I didn't > find any exceptions, but I don't claim an exhaustive search.) > > o The umount_tree() function's use of list_del_init() looks like > it could trap an RCU reader in the newly singular list formed > by the removal. It appears that there are other functions that > use list_del_init() on this list, though I cannot claim any sort > of familiarity with this code. > > So, do you need to add a check for child->mnt_child being in this > self-referential state within fsinfo_generic_mount_children()? > > Plus list_del_init() doesn't mark its stores, though > some would argue that unmarked stores are OK in this situation. > > o There might be other operations in need of RCU-ification. > > Maybe the list_add_tail() in umount_tree(), but it is not > immediately clear that this is adding a new element instead of > re-inserting an element already exposed to readers. IMO all of that is a good argument *against* trying to pull any kind of RCU games here. Access to these lists is assumed to be serialized on mount_lock spinlock component held exclusive and unless there is a very good reason to go for something trickier, let's not. Hash chains are supposed to be walked under rcu_read_lock(), requiring to recheck the mount_lock seqcount *AND* with use of legitimize_mnt() if you are to try and get a reference out of that. ->mnt_parent chains also can be walked in the same conditions (subject to the same requirements). The policy with everything else is * get mount_lock spinlock exclusive or * get namespace_sem or * just fucking don't do it.