Re: [PATCH v2] docs/RCU: Bring back smp_wmb()

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Sat, 15 Jul 2023 10:19:44 -0700

On Sat, Jul 15, 2023 at 08:50:23AM +0800, Alan Huang wrote:
> 
> > 2023年7月15日 07:23，Paul E. McKenney <paulmck@xxxxxxxxxx> 写道：
> > 
> > On Tue, Jul 11, 2023 at 03:09:06PM +0000, Alan Huang wrote:
> >> The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is
> >> n->next = first within hlist_add_head_rcu() before rcu_assign_pointer(),
> >> which modifies obj->obj_node.next. There may be readers holding the
> >> reference of obj in lockless_lookup, and when updater modifies ->next,
> >> readers can see the change immediately because ofSLAB_TYPESAFE_BY_RCU.
> >> 
> >> There are two memory ordering required in the insertion algorithm,
> >> we need to make sure obj->key is updated before obj->obj_node.next
> >> and obj->refcnt, atomic_set_release is not enough to provide the
> >> required memory barrier.
> >> 
> >> Signed-off-by: Alan Huang <mmpgouride@xxxxxxxxx>
> > 
> > This is an interesting one!!!
> > 
> > Now I am having a hard time believing that the smp_rmb() suffices.
> > 
> >> ---
> >> Changelog:
> >>  v1 -> v2: Use _ONCE to protect obj->key.
> >> 
> >> Documentation/RCU/rculist_nulls.rst | 21 +++++++++++++--------
> >> 1 file changed, 13 insertions(+), 8 deletions(-)
> >> 
> >> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
> >> index 21e40fcc08de..2a9f5a63d334 100644
> >> --- a/Documentation/RCU/rculist_nulls.rst
> >> +++ b/Documentation/RCU/rculist_nulls.rst
> >> @@ -47,7 +47,7 @@ objects, which is having below type.
> >>     * reuse these object before the RCU grace period, we
> >>     * must check key after getting the reference on object
> >>     */
> >> -    if (obj->key != key) { // not the object we expected
> >> +    if (READ_ONCE(obj->key) != key) { // not the object we expected
> >>       put_ref(obj);
> >>       rcu_read_unlock();
> >>       goto begin;
> >> @@ -64,10 +64,10 @@ but a version with an additional memory barrier (smp_rmb())
> >>   {
> >>     struct hlist_node *node, *next;
> >>     for (pos = rcu_dereference((head)->first);
> >> -         pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
> >> +         pos && ({ next = READ_ONCE(pos->next); smp_rmb(); prefetch(next); 1; }) &&
> > 
> > Suppose that lockless_lookup() is delayed just before fetching pos->next,
> > and that there were 17 more node to search in the list.
> > 
> > Then consider the following sequence of events:
> > 
> > o The updater deletes this same node and kmem_cache_free()s it.
> > 
> > o Another updater kmem_cache_alloc()s that same memory and
> > inserts it into an empty hash chain with a different key.
> > 
> > o Then lockless_lookup() fetches pos->next and sees a NULL pointer,
> > thus failing to search the remaining 17 nodes in the list,
> > one of which had the desired key value.
> > 
> > o The lookup algorithm resumes and sees the NULL return from
> > lockless_lookup(), and ends up with a NULL obj.
> > 
> > And this happens even with the strongest possible ordering
> > everywhere.
> > 
> > OK, yes, it is late on Friday.  So what am I missing here?
> 
> You missed nothing!
> 
> The lockless_lockup should not be a function, but a macro like hlist_for_each_entry_rcu.

How would you fix this using a macro?

> > Independent of that, does hlist_add_head_rcu() need to replace its
> > "n->next = first" with "WRITE_ONCE(n->next, first)"?
> 
> I think users who want to use hlist with SLAB_TYPESAFE_BY_RCU should use rculist_nulls?

I believe that you are correct.  Would you like to propose a patch, or
would you rather I put something together?  My current thought is to
keep the examples, but to show why the one with smp_rmb() is broken.

> I didn’t find a case using hlist with SLAB_TYPESAFE_BY_RCU, but I did find a case using list
> with SLAB_TYPESAFE_BY_RCU in drivers/gpu/drm/i915, the driver also doesn’t use _ONCE
> on the fields of the objects allocated with SLAB_TYPESAFE_BY_RCU.

Feel free to send them a patch, though I cannot speak for their
reception of it.

							Thanx, Paul

> > Thanx, Paul
> > 
> >>          ({ obj = hlist_entry(pos, typeof(*obj), obj_node); 1; });
> >>          pos = rcu_dereference(next))
> >> -      if (obj->key == key)
> >> +      if (READ_ONCE(obj->key) == key)
> >>         return obj;
> >>     return NULL;
> >>   }
> >> @@ -111,8 +111,13 @@ detect the fact that it missed following items in original chain.
> >>    */
> >>   obj = kmem_cache_alloc(...);
> >>   lock_chain(); // typically a spin_lock()
> >> -  obj->key = key;
> >> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
> >> +  WRITE_ONCE(obj->key, key);
> >> +  /*
> >> +   * We need to make sure obj->key is updated before obj->obj_node.next
> >> +   * and obj->refcnt.
> >> +   */
> >> +  smp_wmb();
> >> +  atomic_set(&obj->refcnt, 1);
> >>   hlist_add_head_rcu(&obj->obj_node, list);
> >>   unlock_chain(); // typically a spin_unlock()
> >> 
> >> @@ -165,12 +170,12 @@ Note that using hlist_nulls means the type of 'obj_node' field of
> >>   begin:
> >>   rcu_read_lock();
> >>   hlist_nulls_for_each_entry_rcu(obj, node, head, obj_node) {
> >> -    if (obj->key == key) {
> >> +    if (READ_ONCE(obj->key) == key) {
> >>       if (!try_get_ref(obj)) { // might fail for free objects
> >> rcu_read_unlock();
> >>         goto begin;
> >>       }
> >> -      if (obj->key != key) { // not the object we expected
> >> +      if (READ_ONCE(obj->key) != key) { // not the object we expected
> >>         put_ref(obj);
> >> rcu_read_unlock();
> >>         goto begin;
> >> @@ -206,7 +211,7 @@ hlist_add_head_rcu().
> >>    */
> >>   obj = kmem_cache_alloc(cachep);
> >>   lock_chain(); // typically a spin_lock()
> >> -  obj->key = key;
> >> +  WRITE_ONCE(obj->key, key);
> >>   atomic_set_release(&obj->refcnt, 1); // key before refcnt
> >>   /*
> >>    * insert obj in RCU way (readers might be traversing chain)
> >> -- 
> >> 2.34.1
> 
>