On Tue, Jul 18, 2023 at 01:53:10AM +0800, Alan Huang wrote: > > 2023年7月18日 00:02,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道: > > On Sun, Jul 16, 2023 at 07:21:28PM +0800, Alan Huang wrote: > >>> 2023年7月16日 01:19,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道: > >>> On Sat, Jul 15, 2023 at 08:50:23AM +0800, Alan Huang wrote: > >>>>> 2023年7月15日 07:23,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道: > >>>>> On Tue, Jul 11, 2023 at 03:09:06PM +0000, Alan Huang wrote: > >>>>>> The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is > >>>>>> n->next = first within hlist_add_head_rcu() before rcu_assign_pointer(), > >>>>>> which modifies obj->obj_node.next. There may be readers holding the > >>>>>> reference of obj in lockless_lookup, and when updater modifies ->next, > >>>>>> readers can see the change immediately because ofSLAB_TYPESAFE_BY_RCU. > >>>>>> > >>>>>> There are two memory ordering required in the insertion algorithm, > >>>>>> we need to make sure obj->key is updated before obj->obj_node.next > >>>>>> and obj->refcnt, atomic_set_release is not enough to provide the > >>>>>> required memory barrier. > >>>>>> > >>>>>> Signed-off-by: Alan Huang <mmpgouride@xxxxxxxxx> > >>>>> > >>>>> This is an interesting one!!! > >>>>> > >>>>> Now I am having a hard time believing that the smp_rmb() suffices. > >>>>> > >>>>>> --- > >>>>>> Changelog: > >>>>>> v1 -> v2: Use _ONCE to protect obj->key. > >>>>>> > >>>>>> Documentation/RCU/rculist_nulls.rst | 21 +++++++++++++-------- > >>>>>> 1 file changed, 13 insertions(+), 8 deletions(-) > >>>>>> > >>>>>> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst > >>>>>> index 21e40fcc08de..2a9f5a63d334 100644 > >>>>>> --- a/Documentation/RCU/rculist_nulls.rst > >>>>>> +++ b/Documentation/RCU/rculist_nulls.rst > >>>>>> @@ -47,7 +47,7 @@ objects, which is having below type. > >>>>>> * reuse these object before the RCU grace period, we > >>>>>> * must check key after getting the reference on object > >>>>>> */ > >>>>>> - if (obj->key != key) { // not the object we expected > >>>>>> + if (READ_ONCE(obj->key) != key) { // not the object we expected > >>>>>> put_ref(obj); > >>>>>> rcu_read_unlock(); > >>>>>> goto begin; > >>>>>> @@ -64,10 +64,10 @@ but a version with an additional memory barrier (smp_rmb()) > >>>>>> { > >>>>>> struct hlist_node *node, *next; > >>>>>> for (pos = rcu_dereference((head)->first); > >>>>>> - pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) && > >>>>>> + pos && ({ next = READ_ONCE(pos->next); smp_rmb(); prefetch(next); 1; }) && > >>>>> > >>>>> Suppose that lockless_lookup() is delayed just before fetching pos->next, > >>>>> and that there were 17 more node to search in the list. > >>>>> > >>>>> Then consider the following sequence of events: > >>>>> > >>>>> o The updater deletes this same node and kmem_cache_free()s it. > >>>>> > >>>>> o Another updater kmem_cache_alloc()s that same memory and > >>>>> inserts it into an empty hash chain with a different key. > >>>>> > >>>>> o Then lockless_lookup() fetches pos->next and sees a NULL pointer, > >>>>> thus failing to search the remaining 17 nodes in the list, > >>>>> one of which had the desired key value. > >>>>> > >>>>> o The lookup algorithm resumes and sees the NULL return from > >>>>> lockless_lookup(), and ends up with a NULL obj. > >>>>> > >>>>> And this happens even with the strongest possible ordering > >>>>> everywhere. > >>>>> > >>>>> OK, yes, it is late on Friday. So what am I missing here? > >>>> > >>>> You missed nothing! > >>>> > >>>> The lockless_lockup should not be a function, but a macro like hlist_for_each_entry_rcu. > >>> > >>> How would you fix this using a macro? > >> > >> With additional detection code. A moved object (in another chain) will have a different slot. > >> (I have sent patch v3. ) > >> > >>> > >>>>> Independent of that, does hlist_add_head_rcu() need to replace its > >>>>> "n->next = first" with "WRITE_ONCE(n->next, first)"? > >>>> > >>>> I think users who want to use hlist with SLAB_TYPESAFE_BY_RCU should use rculist_nulls? > >>> > >>> I believe that you are correct. Would you like to propose a patch, or > >>> would you rather I put something together? My current thought is to > >> > >> Feel free to add. > >> > >> One thing I think would be useful is to tell readers where the ‘next' is. > >> The document mentions ’next’ many times, but it’s hard for me, as a reader, to realize that > >> the ‘next' is within hlist_add_head_rcu(). (I have no idea where to put the hint.) > >> > >> > >>> keep the examples, but to show why the one with smp_rmb() is broken. > >> > >> I think the example needs to be fixed. :) > > > > Even better! I will take a look, but in the meantime, would you be > > interested in updating the wording to explain how the back-pointer works? > > Which document needs to be updated? > And is there anything that I can refer to? It’s the first time I have ever heard about it. Documentation/RCU/rculist_nulls.rst, the one that you are updating. There admittedly isn't a whole lot of commentary. > > (Looks similar to the is_a_nulls() pointer, but in each element instead of > > just at the end. One advantage is the ability to detect a move mid-list, > > though that is not a big deal in well-tuned hash tables, which tend to > > have short hash chains. The need to move elements to the front of the > > destination list remains, though in both cases only if it has been less > > than a grace period since the last move.) > > Looks like that I need to learn it first. :) Well, you wrote the code, so... ;-) Thanx, Paul > > Thanx, Paul > > > >>>> I didn’t find a case using hlist with SLAB_TYPESAFE_BY_RCU, but I did find a case using list > >>>> with SLAB_TYPESAFE_BY_RCU in drivers/gpu/drm/i915, the driver also doesn’t use _ONCE > >>>> on the fields of the objects allocated with SLAB_TYPESAFE_BY_RCU. > >>> > >>> Feel free to send them a patch, though I cannot speak for their > >>> reception of it. > >>> > >>> Thanx, Paul > >>> > >>>>> Thanx, Paul > >>>>> > >>>>>> ({ obj = hlist_entry(pos, typeof(*obj), obj_node); 1; }); > >>>>>> pos = rcu_dereference(next)) > >>>>>> - if (obj->key == key) > >>>>>> + if (READ_ONCE(obj->key) == key) > >>>>>> return obj; > >>>>>> return NULL; > >>>>>> } > >>>>>> @@ -111,8 +111,13 @@ detect the fact that it missed following items in original chain. > >>>>>> */ > >>>>>> obj = kmem_cache_alloc(...); > >>>>>> lock_chain(); // typically a spin_lock() > >>>>>> - obj->key = key; > >>>>>> - atomic_set_release(&obj->refcnt, 1); // key before refcnt > >>>>>> + WRITE_ONCE(obj->key, key); > >>>>>> + /* > >>>>>> + * We need to make sure obj->key is updated before obj->obj_node.next > >>>>>> + * and obj->refcnt. > >>>>>> + */ > >>>>>> + smp_wmb(); > >>>>>> + atomic_set(&obj->refcnt, 1); > >>>>>> hlist_add_head_rcu(&obj->obj_node, list); > >>>>>> unlock_chain(); // typically a spin_unlock() > >>>>>> > >>>>>> @@ -165,12 +170,12 @@ Note that using hlist_nulls means the type of 'obj_node' field of > >>>>>> begin: > >>>>>> rcu_read_lock(); > >>>>>> hlist_nulls_for_each_entry_rcu(obj, node, head, obj_node) { > >>>>>> - if (obj->key == key) { > >>>>>> + if (READ_ONCE(obj->key) == key) { > >>>>>> if (!try_get_ref(obj)) { // might fail for free objects > >>>>>> rcu_read_unlock(); > >>>>>> goto begin; > >>>>>> } > >>>>>> - if (obj->key != key) { // not the object we expected > >>>>>> + if (READ_ONCE(obj->key) != key) { // not the object we expected > >>>>>> put_ref(obj); > >>>>>> rcu_read_unlock(); > >>>>>> goto begin; > >>>>>> @@ -206,7 +211,7 @@ hlist_add_head_rcu(). > >>>>>> */ > >>>>>> obj = kmem_cache_alloc(cachep); > >>>>>> lock_chain(); // typically a spin_lock() > >>>>>> - obj->key = key; > >>>>>> + WRITE_ONCE(obj->key, key); > >>>>>> atomic_set_release(&obj->refcnt, 1); // key before refcnt > >>>>>> /* > >>>>>> * insert obj in RCU way (readers might be traversing chain) > >>>>>> -- > >>>>>> 2.34.1 > >