Re: [PATCH v2] docs/RCU: Bring back smp_wmb()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 2023年7月25日 00:09,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道:
> 
> On Sat, Jul 22, 2023 at 09:02:46AM +0800, Alan Huang wrote:
>> 
>>> 2023年7月22日 上午6:49,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道:
>>> 
>>> On Wed, Jul 19, 2023 at 12:08:33AM +0800, Alan Huang wrote:
>>>> 
>>>>> 2023年7月18日 03:06,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道:
>>>>> 
>>>>> On Tue, Jul 18, 2023 at 01:53:10AM +0800, Alan Huang wrote:
>>>>>>> 2023年7月18日 00:02,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道:
>>>>>>> On Sun, Jul 16, 2023 at 07:21:28PM +0800, Alan Huang wrote:
>>>>>>>>> 2023年7月16日 01:19,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道:
>>>>>>>>> On Sat, Jul 15, 2023 at 08:50:23AM +0800, Alan Huang wrote:
>>>>>>>>>>> 2023年7月15日 07:23,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道:
>>>>>>>>>>> On Tue, Jul 11, 2023 at 03:09:06PM +0000, Alan Huang wrote:
>>>>>>>>>>>> The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is
>>>>>>>>>>>> n->next = first within hlist_add_head_rcu() before rcu_assign_pointer(),
>>>>>>>>>>>> which modifies obj->obj_node.next. There may be readers holding the
>>>>>>>>>>>> reference of obj in lockless_lookup, and when updater modifies ->next,
>>>>>>>>>>>> readers can see the change immediately because ofSLAB_TYPESAFE_BY_RCU.
>>>>>>>>>>>> 
>>>>>>>>>>>> There are two memory ordering required in the insertion algorithm,
>>>>>>>>>>>> we need to make sure obj->key is updated before obj->obj_node.next
>>>>>>>>>>>> and obj->refcnt, atomic_set_release is not enough to provide the
>>>>>>>>>>>> required memory barrier.
>>>>>>>>>>>> 
>>>>>>>>>>>> Signed-off-by: Alan Huang <mmpgouride@xxxxxxxxx>
>>>>>>>>>>> 
>>>>>>>>>>> This is an interesting one!!!
>>>>>>>>>>> 
>>>>>>>>>>> Now I am having a hard time believing that the smp_rmb() suffices.
>>>>>>>>>>> 
>>>>>>>>>>>> ---
>>>>>>>>>>>> Changelog:
>>>>>>>>>>>> v1 -> v2: Use _ONCE to protect obj->key.
>>>>>>>>>>>> 
>>>>>>>>>>>> Documentation/RCU/rculist_nulls.rst | 21 +++++++++++++--------
>>>>>>>>>>>> 1 file changed, 13 insertions(+), 8 deletions(-)
>>>>>>>>>>>> 
>>>>>>>>>>>> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
>>>>>>>>>>>> index 21e40fcc08de..2a9f5a63d334 100644
>>>>>>>>>>>> --- a/Documentation/RCU/rculist_nulls.rst
>>>>>>>>>>>> +++ b/Documentation/RCU/rculist_nulls.rst
>>>>>>>>>>>> @@ -47,7 +47,7 @@ objects, which is having below type.
>>>>>>>>>>>> * reuse these object before the RCU grace period, we
>>>>>>>>>>>> * must check key after getting the reference on object
>>>>>>>>>>>> */
>>>>>>>>>>>> -    if (obj->key != key) { // not the object we expected
>>>>>>>>>>>> +    if (READ_ONCE(obj->key) != key) { // not the object we expected
>>>>>>>>>>>>  put_ref(obj);
>>>>>>>>>>>>  rcu_read_unlock();
>>>>>>>>>>>>  goto begin;
>>>>>>>>>>>> @@ -64,10 +64,10 @@ but a version with an additional memory barrier (smp_rmb())
>>>>>>>>>>>> {
>>>>>>>>>>>> struct hlist_node *node, *next;
>>>>>>>>>>>> for (pos = rcu_dereference((head)->first);
>>>>>>>>>>>> -         pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
>>>>>>>>>>>> +         pos && ({ next = READ_ONCE(pos->next); smp_rmb(); prefetch(next); 1; }) &&
>>>>>>>>>>> 
>>>>>>>>>>> Suppose that lockless_lookup() is delayed just before fetching pos->next,
>>>>>>>>>>> and that there were 17 more node to search in the list.
>>>>>>>>>>> 
>>>>>>>>>>> Then consider the following sequence of events:
>>>>>>>>>>> 
>>>>>>>>>>> o The updater deletes this same node and kmem_cache_free()s it.
>>>>>>>>>>> 
>>>>>>>>>>> o Another updater kmem_cache_alloc()s that same memory and
>>>>>>>>>>> inserts it into an empty hash chain with a different key.
>>>>>>>>>>> 
>>>>>>>>>>> o Then lockless_lookup() fetches pos->next and sees a NULL pointer,
>>>>>>>>>>> thus failing to search the remaining 17 nodes in the list,
>>>>>>>>>>> one of which had the desired key value.
>>>>>>>>>>> 
>>>>>>>>>>> o The lookup algorithm resumes and sees the NULL return from
>>>>>>>>>>> lockless_lookup(), and ends up with a NULL obj.
>>>>>>>>>>> 
>>>>>>>>>>> And this happens even with the strongest possible ordering
>>>>>>>>>>> everywhere.
>>>>>>>>>>> 
>>>>>>>>>>> OK, yes, it is late on Friday.  So what am I missing here?
>>>>>>>>>> 
>>>>>>>>>> You missed nothing!
>>>>>>>>>> 
>>>>>>>>>> The lockless_lockup should not be a function, but a macro like hlist_for_each_entry_rcu.
>>>>>>>>> 
>>>>>>>>> How would you fix this using a macro?
>>>>>>>> 
>>>>>>>> With additional detection code. A moved object (in another chain) will have a different slot.
>>>>>>>> (I have sent patch v3. )
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>> Independent of that, does hlist_add_head_rcu() need to replace its
>>>>>>>>>>> "n->next = first" with "WRITE_ONCE(n->next, first)"?
>>>>>>>>>> 
>>>>>>>>>> I think users who want to use hlist with SLAB_TYPESAFE_BY_RCU should use rculist_nulls?
>>>>>>>>> 
>>>>>>>>> I believe that you are correct.  Would you like to propose a patch, or
>>>>>>>>> would you rather I put something together?  My current thought is to
>>>>>>>> 
>>>>>>>> Feel free to add. 
>>>>>>>> 
>>>>>>>> One thing I think would be useful is to tell readers where the ‘next' is. 
>>>>>>>> The document mentions ’next’ many times, but it’s hard for me, as a reader, to realize that
>>>>>>>> the ‘next' is within hlist_add_head_rcu(). (I have no idea where to put the hint.)
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> keep the examples, but to show why the one with smp_rmb() is broken.
>>>>>>>> 
>>>>>>>> I think the example needs to be fixed. :)
>>>>>>> 
>>>>>>> Even better!  I will take a look, but in the meantime, would you be
>>>>>>> interested in updating the wording to explain how the back-pointer works?
>>>>>> 
>>>>>> Which document needs to be updated? 
>>>>>> And is there anything that I can refer to? It’s the first time I have ever heard about it.
>>>>> 
>>>>> Documentation/RCU/rculist_nulls.rst, the one that you are updating.
>>>>> 
>>>>> There admittedly isn't a whole lot of commentary.
>>>>> 
>>>>>>> (Looks similar to the is_a_nulls() pointer, but in each element instead of
>>>>>>> just at the end.  One advantage is the ability to detect a move mid-list,
>>>>>>> though that is not a big deal in well-tuned hash tables, which tend to
>>>>>>> have short hash chains.  The need to move elements to the front of the
>>>>>>> destination list remains, though in both cases only if it has been less
>>>>>>> than a grace period since the last move.)
>>>>>> 
>>>>>> Looks like that I need to learn it first. :)
>>>>> 
>>>>> Well, you wrote the code, so...  ;-)
>>>> 
>>>> If I understand correctly, it works only for 64-bit machines?
>>>> 
>>>> And the number of slots of the hash table will be limited?
>>> 
>>> You are asking about the is_a_nulls() value?  If so, it works on both
>>> 32-bit and 64-bit machines.  They each have enough bits for the nulls
>>> value to cover all possible two-byte objects in the full address space.
>>> 
>>> If that wasn't what you were asking, please help me with your question.
>> 
>> I’m asking the ‘back-pointer’, which
>> 
>> "Looks similar to the is_a_nulls() pointer, but in each element instead of
>>  just at the end”
>> 
>> If I understand correctly, we need to store the nulls in the upper unused bits.
>> 
>> And we only have several bits unused. One advantage I can think of is that it 
>> will improve the performance once we have something like the Intel Upper Address Ignore (UAI),
>> which also works only for 64-bit machine.
> 
> Ah, I was using the word "similar" very loosely.  Not "implemented in
> a manner similar to is_a_nulls(), but rather "serves roughly the same
> function as is_a_nulls()".

I got it.

> 
> You should be able to just have the pointer to the bucket in each
> element, and just compare for each element.  No need for extra bits,

It might have the same functionality if we put the nulls value in each element.

> though UAI, like its pre-existing ARM counterpart, might also help catch
> use-after-free bugs when resizing the hash table.
> 
> Thanx, Paul
> 
>>>>>>>>>> I didn’t find a case using hlist with SLAB_TYPESAFE_BY_RCU, but I did find a case using list
>>>>>>>>>> with SLAB_TYPESAFE_BY_RCU in drivers/gpu/drm/i915, the driver also doesn’t use _ONCE
>>>>>>>>>> on the fields of the objects allocated with SLAB_TYPESAFE_BY_RCU.
>>>>>>>>> 
>>>>>>>>> Feel free to send them a patch, though I cannot speak for their
>>>>>>>>> reception of it.
>>>>>>>>> 
>>>>>>>>> Thanx, Paul
>>>>>>>>> 
>>>>>>>>>>> Thanx, Paul
>>>>>>>>>>> 
>>>>>>>>>>>>     ({ obj = hlist_entry(pos, typeof(*obj), obj_node); 1; });
>>>>>>>>>>>>     pos = rcu_dereference(next))
>>>>>>>>>>>> -      if (obj->key == key)
>>>>>>>>>>>> +      if (READ_ONCE(obj->key) == key)
>>>>>>>>>>>>    return obj;
>>>>>>>>>>>> return NULL;
>>>>>>>>>>>> }
>>>>>>>>>>>> @@ -111,8 +111,13 @@ detect the fact that it missed following items in original chain.
>>>>>>>>>>>> */
>>>>>>>>>>>> obj = kmem_cache_alloc(...);
>>>>>>>>>>>> lock_chain(); // typically a spin_lock()
>>>>>>>>>>>> -  obj->key = key;
>>>>>>>>>>>> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
>>>>>>>>>>>> +  WRITE_ONCE(obj->key, key);
>>>>>>>>>>>> +  /*
>>>>>>>>>>>> +   * We need to make sure obj->key is updated before obj->obj_node.next
>>>>>>>>>>>> +   * and obj->refcnt.
>>>>>>>>>>>> +   */
>>>>>>>>>>>> +  smp_wmb();
>>>>>>>>>>>> +  atomic_set(&obj->refcnt, 1);
>>>>>>>>>>>> hlist_add_head_rcu(&obj->obj_node, list);
>>>>>>>>>>>> unlock_chain(); // typically a spin_unlock()
>>>>>>>>>>>> 
>>>>>>>>>>>> @@ -165,12 +170,12 @@ Note that using hlist_nulls means the type of 'obj_node' field of
>>>>>>>>>>>> begin:
>>>>>>>>>>>> rcu_read_lock();
>>>>>>>>>>>> hlist_nulls_for_each_entry_rcu(obj, node, head, obj_node) {
>>>>>>>>>>>> -    if (obj->key == key) {
>>>>>>>>>>>> +    if (READ_ONCE(obj->key) == key) {
>>>>>>>>>>>>  if (!try_get_ref(obj)) { // might fail for free objects
>>>>>>>>>>>> rcu_read_unlock();
>>>>>>>>>>>>    goto begin;
>>>>>>>>>>>>  }
>>>>>>>>>>>> -      if (obj->key != key) { // not the object we expected
>>>>>>>>>>>> +      if (READ_ONCE(obj->key) != key) { // not the object we expected
>>>>>>>>>>>>    put_ref(obj);
>>>>>>>>>>>> rcu_read_unlock();
>>>>>>>>>>>>    goto begin;
>>>>>>>>>>>> @@ -206,7 +211,7 @@ hlist_add_head_rcu().
>>>>>>>>>>>> */
>>>>>>>>>>>> obj = kmem_cache_alloc(cachep);
>>>>>>>>>>>> lock_chain(); // typically a spin_lock()
>>>>>>>>>>>> -  obj->key = key;
>>>>>>>>>>>> +  WRITE_ONCE(obj->key, key);
>>>>>>>>>>>> atomic_set_release(&obj->refcnt, 1); // key before refcnt
>>>>>>>>>>>> /*
>>>>>>>>>>>> * insert obj in RCU way (readers might be traversing chain)
>>>>>>>>>>>> -- 
>>>>>>>>>>>> 2.34.1






[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux