Re: Do we need to correct barriering in circular-buffers.rst?

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Mon, 23 Sep 2019 16:49:31 +0200

On Thu, Sep 19, 2019 at 02:59:06PM +0100, David Howells wrote:

> But I don't agree with this.  You're missing half the barriers.  There should
> be *four* barriers.  The document mandates only 3 barriers, and uses
> READ_ONCE() where the fourth should be, i.e.:
> 
>    thread #1            thread #2
> 
>                         smp_load_acquire(head)
>                         ... read data from queue ..
>                         smp_store_release(tail)
> 
>    READ_ONCE(tail)
>    ... add data to queue ..
>    smp_store_release(head)
> 

Notably your READ_ONCE() pseudo code is lacking a conditional;
kernel/events/ring_buffer.c writes it like so:

 *   kernel                             user
 *
 *   if (LOAD ->data_tail) {            LOAD ->data_head
 *                      (A)             smp_rmb()       (C)
 *      STORE $data                     LOAD $data
 *      smp_wmb()       (B)             smp_mb()        (D)
 *      STORE ->data_head               STORE ->data_tail
 *   }
 *
 * Where A pairs with D, and B pairs with C.
 *
 * In our case (A) is a control dependency that separates the load of
 * the ->data_tail and the stores of $data. In case ->data_tail
 * indicates there is no room in the buffer to store $data we do not.
 *
 * D needs to be a full barrier since it separates the data READ
 * from the tail WRITE.
 *
 * For B a WMB is sufficient since it separates two WRITEs, and for C
 * an RMB is sufficient since it separates two READs.

Where 'kernel' is the producer and 'user' is the consumer. This was
written before load-acquire and store-release came about (I _think_),
and I've so far resisted updating B to store-release because smp_wmb()
is actually cheaper than store-release on a number of architectures
(notably ARM).

C ought to be a load-aquire, and D really should be a store-release, but
I don't think the perf userspace has that (or uses C11).