Re: [PATCH v2] docs/RCU: Bring back smp_wmb()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 2023年7月16日 01:19,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道:
> 
> On Sat, Jul 15, 2023 at 08:50:23AM +0800, Alan Huang wrote:
>> 
>>> 2023年7月15日 07:23,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道:
>>> 
>>> On Tue, Jul 11, 2023 at 03:09:06PM +0000, Alan Huang wrote:
>>>> The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is
>>>> n->next = first within hlist_add_head_rcu() before rcu_assign_pointer(),
>>>> which modifies obj->obj_node.next. There may be readers holding the
>>>> reference of obj in lockless_lookup, and when updater modifies ->next,
>>>> readers can see the change immediately because ofSLAB_TYPESAFE_BY_RCU.
>>>> 
>>>> There are two memory ordering required in the insertion algorithm,
>>>> we need to make sure obj->key is updated before obj->obj_node.next
>>>> and obj->refcnt, atomic_set_release is not enough to provide the
>>>> required memory barrier.
>>>> 
>>>> Signed-off-by: Alan Huang <mmpgouride@xxxxxxxxx>
>>> 
>>> This is an interesting one!!!
>>> 
>>> Now I am having a hard time believing that the smp_rmb() suffices.
>>> 
>>>> ---
>>>> Changelog:
>>>> v1 -> v2: Use _ONCE to protect obj->key.
>>>> 
>>>> Documentation/RCU/rculist_nulls.rst | 21 +++++++++++++--------
>>>> 1 file changed, 13 insertions(+), 8 deletions(-)
>>>> 
>>>> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
>>>> index 21e40fcc08de..2a9f5a63d334 100644
>>>> --- a/Documentation/RCU/rculist_nulls.rst
>>>> +++ b/Documentation/RCU/rculist_nulls.rst
>>>> @@ -47,7 +47,7 @@ objects, which is having below type.
>>>>    * reuse these object before the RCU grace period, we
>>>>    * must check key after getting the reference on object
>>>>    */
>>>> -    if (obj->key != key) { // not the object we expected
>>>> +    if (READ_ONCE(obj->key) != key) { // not the object we expected
>>>>      put_ref(obj);
>>>>      rcu_read_unlock();
>>>>      goto begin;
>>>> @@ -64,10 +64,10 @@ but a version with an additional memory barrier (smp_rmb())
>>>>  {
>>>>    struct hlist_node *node, *next;
>>>>    for (pos = rcu_dereference((head)->first);
>>>> -         pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
>>>> +         pos && ({ next = READ_ONCE(pos->next); smp_rmb(); prefetch(next); 1; }) &&
>>> 
>>> Suppose that lockless_lookup() is delayed just before fetching pos->next,
>>> and that there were 17 more node to search in the list.
>>> 
>>> Then consider the following sequence of events:
>>> 
>>> o The updater deletes this same node and kmem_cache_free()s it.
>>> 
>>> o Another updater kmem_cache_alloc()s that same memory and
>>> inserts it into an empty hash chain with a different key.
>>> 
>>> o Then lockless_lookup() fetches pos->next and sees a NULL pointer,
>>> thus failing to search the remaining 17 nodes in the list,
>>> one of which had the desired key value.
>>> 
>>> o The lookup algorithm resumes and sees the NULL return from
>>> lockless_lookup(), and ends up with a NULL obj.
>>> 
>>> And this happens even with the strongest possible ordering
>>> everywhere.
>>> 
>>> OK, yes, it is late on Friday.  So what am I missing here?
>> 
>> You missed nothing!
>> 
>> The lockless_lockup should not be a function, but a macro like hlist_for_each_entry_rcu.
> 
> How would you fix this using a macro?

With additional detection code. A moved object (in another chain) will have a different slot.
(I have sent patch v3. )

> 
>>> Independent of that, does hlist_add_head_rcu() need to replace its
>>> "n->next = first" with "WRITE_ONCE(n->next, first)"?
>> 
>> I think users who want to use hlist with SLAB_TYPESAFE_BY_RCU should use rculist_nulls?
> 
> I believe that you are correct.  Would you like to propose a patch, or
> would you rather I put something together?  My current thought is to

Feel free to add. 

One thing I think would be useful is to tell readers where the ‘next' is. 
The document mentions ’next’ many times, but it’s hard for me, as a reader, to realize that
the ‘next' is within hlist_add_head_rcu(). (I have no idea where to put the hint.)


> keep the examples, but to show why the one with smp_rmb() is broken.

I think the example needs to be fixed. :)

> 
>> I didn’t find a case using hlist with SLAB_TYPESAFE_BY_RCU, but I did find a case using list
>> with SLAB_TYPESAFE_BY_RCU in drivers/gpu/drm/i915, the driver also doesn’t use _ONCE
>> on the fields of the objects allocated with SLAB_TYPESAFE_BY_RCU.
> 
> Feel free to send them a patch, though I cannot speak for their
> reception of it.
> 
> Thanx, Paul
> 
>>> Thanx, Paul
>>> 
>>>>         ({ obj = hlist_entry(pos, typeof(*obj), obj_node); 1; });
>>>>         pos = rcu_dereference(next))
>>>> -      if (obj->key == key)
>>>> +      if (READ_ONCE(obj->key) == key)
>>>>        return obj;
>>>>    return NULL;
>>>>  }
>>>> @@ -111,8 +111,13 @@ detect the fact that it missed following items in original chain.
>>>>   */
>>>>  obj = kmem_cache_alloc(...);
>>>>  lock_chain(); // typically a spin_lock()
>>>> -  obj->key = key;
>>>> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
>>>> +  WRITE_ONCE(obj->key, key);
>>>> +  /*
>>>> +   * We need to make sure obj->key is updated before obj->obj_node.next
>>>> +   * and obj->refcnt.
>>>> +   */
>>>> +  smp_wmb();
>>>> +  atomic_set(&obj->refcnt, 1);
>>>>  hlist_add_head_rcu(&obj->obj_node, list);
>>>>  unlock_chain(); // typically a spin_unlock()
>>>> 
>>>> @@ -165,12 +170,12 @@ Note that using hlist_nulls means the type of 'obj_node' field of
>>>>  begin:
>>>>  rcu_read_lock();
>>>>  hlist_nulls_for_each_entry_rcu(obj, node, head, obj_node) {
>>>> -    if (obj->key == key) {
>>>> +    if (READ_ONCE(obj->key) == key) {
>>>>      if (!try_get_ref(obj)) { // might fail for free objects
>>>> rcu_read_unlock();
>>>>        goto begin;
>>>>      }
>>>> -      if (obj->key != key) { // not the object we expected
>>>> +      if (READ_ONCE(obj->key) != key) { // not the object we expected
>>>>        put_ref(obj);
>>>> rcu_read_unlock();
>>>>        goto begin;
>>>> @@ -206,7 +211,7 @@ hlist_add_head_rcu().
>>>>   */
>>>>  obj = kmem_cache_alloc(cachep);
>>>>  lock_chain(); // typically a spin_lock()
>>>> -  obj->key = key;
>>>> +  WRITE_ONCE(obj->key, key);
>>>>  atomic_set_release(&obj->refcnt, 1); // key before refcnt
>>>>  /*
>>>>   * insert obj in RCU way (readers might be traversing chain)
>>>> -- 
>>>> 2.34.1






[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux