Re: [PATCH v2] docs/RCU: Bring smp_wmb() back

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Sat, 17 Jun 2023 18:23:03 +0100

On Sat, Jun 17, 2023 at 02:53:46PM +0000, Alan Huang wrote:
> There are two memory ordering required in the insertion algorithm,
> we need to make sure obj->key is updated before obj->obj_node.next
> and obj->refcnt, atomic_set_release is not enough to provide the
> required memory barrier.
> 
> Signed-off-by: Alan Huang <mmpgouride@xxxxxxxxx>
> ---
>  Documentation/RCU/rculist_nulls.rst | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
> index e06ed40bb6..77244adbdf 100644
> --- a/Documentation/RCU/rculist_nulls.rst
> +++ b/Documentation/RCU/rculist_nulls.rst
> @@ -98,7 +98,7 @@ Quoting Corey Minyard::
>  ----------------------
>  
>  We need to make sure a reader cannot read the new 'obj->obj_node.next' value
> -and previous value of 'obj->key'. Otherwise, an item could be deleted
> +and previous value of 'obj->key' at the same time. Otherwise, an item could be deleted

"at the same time" doesn't make a lot of sense to me.  CPUs don't do
anything "at the same time".  I think the way this is worded now is
fine; I tried coming up with a few variants of this, but none are as
clear and succinct as what is there now.

>  from a chain, and inserted into another chain. If new chain was empty
>  before the move, 'next' pointer is NULL, and lockless reader can not
>  detect the fact that it missed following items in original chain.
> @@ -112,7 +112,12 @@ detect the fact that it missed following items in original chain.
>    obj = kmem_cache_alloc(...);
>    lock_chain(); // typically a spin_lock()
>    obj->key = key;
> -  atomic_set_release(&obj->refcnt, 1); // key before refcnt
> +  /*
> +  * we need to make sure obj->key is updated before obj->obj_node.next
> +  * and obj->refcnt
> +  */
> +  smp_wmb();
> +  atomic_set(&obj->refcnt, 1);

Perhaps this could be a little clearer for those of us who aren't as
deep into the memory model as others ... are you saying that the
atomic_set_release() would only order references to obj->refcount and
would not order accesses to obj->key?  Because reading this:

     The use of ACQUIRE and RELEASE operations generally precludes the need
     for other sorts of memory barrier.  In addition, a RELEASE+ACQUIRE pair is
     -not- guaranteed to act as a full memory barrier.  However, after an
     ACQUIRE on a given variable, all memory accesses preceding any prior
     RELEASE on that same variable are guaranteed to be visible.  In other
     words, within a given variable's critical section, all accesses of all
     previous critical sections for that variable are guaranteed to have
     completed.

makes me think that this example is fine; anybody doing a load-acquire
on obj->refcount will see the update to obj->key that happened before
the store-release to obj->refcount.

I am not an expert and would welcome the opportunity to learn more here.