> 2023年7月16日 01:19,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道: > > On Sat, Jul 15, 2023 at 08:50:23AM +0800, Alan Huang wrote: >> >>> 2023年7月15日 07:23,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道: >>> >>> On Tue, Jul 11, 2023 at 03:09:06PM +0000, Alan Huang wrote: >>>> The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is >>>> n->next = first within hlist_add_head_rcu() before rcu_assign_pointer(), >>>> which modifies obj->obj_node.next. There may be readers holding the >>>> reference of obj in lockless_lookup, and when updater modifies ->next, >>>> readers can see the change immediately because ofSLAB_TYPESAFE_BY_RCU. >>>> >>>> There are two memory ordering required in the insertion algorithm, >>>> we need to make sure obj->key is updated before obj->obj_node.next >>>> and obj->refcnt, atomic_set_release is not enough to provide the >>>> required memory barrier. >>>> >>>> Signed-off-by: Alan Huang <mmpgouride@xxxxxxxxx> >>> >>> This is an interesting one!!! >>> >>> Now I am having a hard time believing that the smp_rmb() suffices. >>> >>>> --- >>>> Changelog: >>>> v1 -> v2: Use _ONCE to protect obj->key. >>>> >>>> Documentation/RCU/rculist_nulls.rst | 21 +++++++++++++-------- >>>> 1 file changed, 13 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst >>>> index 21e40fcc08de..2a9f5a63d334 100644 >>>> --- a/Documentation/RCU/rculist_nulls.rst >>>> +++ b/Documentation/RCU/rculist_nulls.rst >>>> @@ -47,7 +47,7 @@ objects, which is having below type. >>>> * reuse these object before the RCU grace period, we >>>> * must check key after getting the reference on object >>>> */ >>>> - if (obj->key != key) { // not the object we expected >>>> + if (READ_ONCE(obj->key) != key) { // not the object we expected >>>> put_ref(obj); >>>> rcu_read_unlock(); >>>> goto begin; >>>> @@ -64,10 +64,10 @@ but a version with an additional memory barrier (smp_rmb()) >>>> { >>>> struct hlist_node *node, *next; >>>> for (pos = rcu_dereference((head)->first); >>>> - pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) && >>>> + pos && ({ next = READ_ONCE(pos->next); smp_rmb(); prefetch(next); 1; }) && >>> >>> Suppose that lockless_lookup() is delayed just before fetching pos->next, >>> and that there were 17 more node to search in the list. >>> >>> Then consider the following sequence of events: >>> >>> o The updater deletes this same node and kmem_cache_free()s it. >>> >>> o Another updater kmem_cache_alloc()s that same memory and >>> inserts it into an empty hash chain with a different key. >>> >>> o Then lockless_lookup() fetches pos->next and sees a NULL pointer, >>> thus failing to search the remaining 17 nodes in the list, >>> one of which had the desired key value. >>> >>> o The lookup algorithm resumes and sees the NULL return from >>> lockless_lookup(), and ends up with a NULL obj. >>> >>> And this happens even with the strongest possible ordering >>> everywhere. >>> >>> OK, yes, it is late on Friday. So what am I missing here? >> >> You missed nothing! >> >> The lockless_lockup should not be a function, but a macro like hlist_for_each_entry_rcu. > > How would you fix this using a macro? With additional detection code. A moved object (in another chain) will have a different slot. (I have sent patch v3. ) > >>> Independent of that, does hlist_add_head_rcu() need to replace its >>> "n->next = first" with "WRITE_ONCE(n->next, first)"? >> >> I think users who want to use hlist with SLAB_TYPESAFE_BY_RCU should use rculist_nulls? > > I believe that you are correct. Would you like to propose a patch, or > would you rather I put something together? My current thought is to Feel free to add. One thing I think would be useful is to tell readers where the ‘next' is. The document mentions ’next’ many times, but it’s hard for me, as a reader, to realize that the ‘next' is within hlist_add_head_rcu(). (I have no idea where to put the hint.) > keep the examples, but to show why the one with smp_rmb() is broken. I think the example needs to be fixed. :) > >> I didn’t find a case using hlist with SLAB_TYPESAFE_BY_RCU, but I did find a case using list >> with SLAB_TYPESAFE_BY_RCU in drivers/gpu/drm/i915, the driver also doesn’t use _ONCE >> on the fields of the objects allocated with SLAB_TYPESAFE_BY_RCU. > > Feel free to send them a patch, though I cannot speak for their > reception of it. > > Thanx, Paul > >>> Thanx, Paul >>> >>>> ({ obj = hlist_entry(pos, typeof(*obj), obj_node); 1; }); >>>> pos = rcu_dereference(next)) >>>> - if (obj->key == key) >>>> + if (READ_ONCE(obj->key) == key) >>>> return obj; >>>> return NULL; >>>> } >>>> @@ -111,8 +111,13 @@ detect the fact that it missed following items in original chain. >>>> */ >>>> obj = kmem_cache_alloc(...); >>>> lock_chain(); // typically a spin_lock() >>>> - obj->key = key; >>>> - atomic_set_release(&obj->refcnt, 1); // key before refcnt >>>> + WRITE_ONCE(obj->key, key); >>>> + /* >>>> + * We need to make sure obj->key is updated before obj->obj_node.next >>>> + * and obj->refcnt. >>>> + */ >>>> + smp_wmb(); >>>> + atomic_set(&obj->refcnt, 1); >>>> hlist_add_head_rcu(&obj->obj_node, list); >>>> unlock_chain(); // typically a spin_unlock() >>>> >>>> @@ -165,12 +170,12 @@ Note that using hlist_nulls means the type of 'obj_node' field of >>>> begin: >>>> rcu_read_lock(); >>>> hlist_nulls_for_each_entry_rcu(obj, node, head, obj_node) { >>>> - if (obj->key == key) { >>>> + if (READ_ONCE(obj->key) == key) { >>>> if (!try_get_ref(obj)) { // might fail for free objects >>>> rcu_read_unlock(); >>>> goto begin; >>>> } >>>> - if (obj->key != key) { // not the object we expected >>>> + if (READ_ONCE(obj->key) != key) { // not the object we expected >>>> put_ref(obj); >>>> rcu_read_unlock(); >>>> goto begin; >>>> @@ -206,7 +211,7 @@ hlist_add_head_rcu(). >>>> */ >>>> obj = kmem_cache_alloc(cachep); >>>> lock_chain(); // typically a spin_lock() >>>> - obj->key = key; >>>> + WRITE_ONCE(obj->key, key); >>>> atomic_set_release(&obj->refcnt, 1); // key before refcnt >>>> /* >>>> * insert obj in RCU way (readers might be traversing chain) >>>> -- >>>> 2.34.1