> 2023年7月15日 07:23,Paul E. McKenney <paulmck@xxxxxxxxxx> 写道: > > On Tue, Jul 11, 2023 at 03:09:06PM +0000, Alan Huang wrote: >> The objects are allocated with SLAB_TYPESAFE_BY_RCU, and there is >> n->next = first within hlist_add_head_rcu() before rcu_assign_pointer(), >> which modifies obj->obj_node.next. There may be readers holding the >> reference of obj in lockless_lookup, and when updater modifies ->next, >> readers can see the change immediately because ofSLAB_TYPESAFE_BY_RCU. >> >> There are two memory ordering required in the insertion algorithm, >> we need to make sure obj->key is updated before obj->obj_node.next >> and obj->refcnt, atomic_set_release is not enough to provide the >> required memory barrier. >> >> Signed-off-by: Alan Huang <mmpgouride@xxxxxxxxx> > > This is an interesting one!!! > > Now I am having a hard time believing that the smp_rmb() suffices. > >> --- >> Changelog: >> v1 -> v2: Use _ONCE to protect obj->key. >> >> Documentation/RCU/rculist_nulls.rst | 21 +++++++++++++-------- >> 1 file changed, 13 insertions(+), 8 deletions(-) >> >> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst >> index 21e40fcc08de..2a9f5a63d334 100644 >> --- a/Documentation/RCU/rculist_nulls.rst >> +++ b/Documentation/RCU/rculist_nulls.rst >> @@ -47,7 +47,7 @@ objects, which is having below type. >> * reuse these object before the RCU grace period, we >> * must check key after getting the reference on object >> */ >> - if (obj->key != key) { // not the object we expected >> + if (READ_ONCE(obj->key) != key) { // not the object we expected >> put_ref(obj); >> rcu_read_unlock(); >> goto begin; >> @@ -64,10 +64,10 @@ but a version with an additional memory barrier (smp_rmb()) >> { >> struct hlist_node *node, *next; >> for (pos = rcu_dereference((head)->first); >> - pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) && >> + pos && ({ next = READ_ONCE(pos->next); smp_rmb(); prefetch(next); 1; }) && > > Suppose that lockless_lookup() is delayed just before fetching pos->next, > and that there were 17 more node to search in the list. > > Then consider the following sequence of events: > > o The updater deletes this same node and kmem_cache_free()s it. > > o Another updater kmem_cache_alloc()s that same memory and > inserts it into an empty hash chain with a different key. > > o Then lockless_lookup() fetches pos->next and sees a NULL pointer, > thus failing to search the remaining 17 nodes in the list, > one of which had the desired key value. > > o The lookup algorithm resumes and sees the NULL return from > lockless_lookup(), and ends up with a NULL obj. > > And this happens even with the strongest possible ordering > everywhere. > > OK, yes, it is late on Friday. So what am I missing here? You missed nothing! The lockless_lockup should not be a function, but a macro like hlist_for_each_entry_rcu. > > Independent of that, does hlist_add_head_rcu() need to replace its > "n->next = first" with "WRITE_ONCE(n->next, first)"? I think users who want to use hlist with SLAB_TYPESAFE_BY_RCU should use rculist_nulls? I didn’t find a case using hlist with SLAB_TYPESAFE_BY_RCU, but I did find a case using list with SLAB_TYPESAFE_BY_RCU in drivers/gpu/drm/i915, the driver also doesn’t use _ONCE on the fields of the objects allocated with SLAB_TYPESAFE_BY_RCU. > > Thanx, Paul > >> ({ obj = hlist_entry(pos, typeof(*obj), obj_node); 1; }); >> pos = rcu_dereference(next)) >> - if (obj->key == key) >> + if (READ_ONCE(obj->key) == key) >> return obj; >> return NULL; >> } >> @@ -111,8 +111,13 @@ detect the fact that it missed following items in original chain. >> */ >> obj = kmem_cache_alloc(...); >> lock_chain(); // typically a spin_lock() >> - obj->key = key; >> - atomic_set_release(&obj->refcnt, 1); // key before refcnt >> + WRITE_ONCE(obj->key, key); >> + /* >> + * We need to make sure obj->key is updated before obj->obj_node.next >> + * and obj->refcnt. >> + */ >> + smp_wmb(); >> + atomic_set(&obj->refcnt, 1); >> hlist_add_head_rcu(&obj->obj_node, list); >> unlock_chain(); // typically a spin_unlock() >> >> @@ -165,12 +170,12 @@ Note that using hlist_nulls means the type of 'obj_node' field of >> begin: >> rcu_read_lock(); >> hlist_nulls_for_each_entry_rcu(obj, node, head, obj_node) { >> - if (obj->key == key) { >> + if (READ_ONCE(obj->key) == key) { >> if (!try_get_ref(obj)) { // might fail for free objects >> rcu_read_unlock(); >> goto begin; >> } >> - if (obj->key != key) { // not the object we expected >> + if (READ_ONCE(obj->key) != key) { // not the object we expected >> put_ref(obj); >> rcu_read_unlock(); >> goto begin; >> @@ -206,7 +211,7 @@ hlist_add_head_rcu(). >> */ >> obj = kmem_cache_alloc(cachep); >> lock_chain(); // typically a spin_lock() >> - obj->key = key; >> + WRITE_ONCE(obj->key, key); >> atomic_set_release(&obj->refcnt, 1); // key before refcnt >> /* >> * insert obj in RCU way (readers might be traversing chain) >> -- >> 2.34.1