Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Tue, 4 Jul 2023 11:46:27 +0200

On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:

>   int x = 0;
>   int y = 0;
>   int r0, r1;
> 
>   int dummy;
> 
>   void t0(void)
>   {
>           __atomic_store_n(&x, 1, __ATOMIC_RELAXED);
> 
>           __atomic_exchange_n(&dummy, 1, __ATOMIC_SEQ_CST);
>           __atomic_thread_fence(__ATOMIC_SEQ_CST);
> 
>           r0 = __atomic_load_n(&y, __ATOMIC_RELAXED);
>   }
> 
>   void t1(void)
>   {
>           __atomic_store_n(&y, 1, __ATOMIC_RELAXED);
>           __atomic_thread_fence(__ATOMIC_SEQ_CST);
>           r1 = __atomic_load_n(&x, __ATOMIC_RELAXED);
>   }
> 
>   // BUG_ON(r0 == 0 && r1 == 0)
> 
> On x86-64 (gcc 13.1 -O2) we get:
> 
>   t0():
>           movl    $1, x(%rip)
>           movl    $1, %eax
>           xchgl   dummy(%rip), %eax
>           lock orq $0, (%rsp)       ;; Redundant with previous exchange.
>           movl    y(%rip), %eax
>           movl    %eax, r0(%rip)
>           ret
>   t1():
>           movl    $1, y(%rip)
>           lock orq $0, (%rsp)
>           movl    x(%rip), %eax
>           movl    %eax, r1(%rip)
>           ret

So I would expect the compilers to do better here. It should know those
__atomic_thread_fence() thingies are superfluous and simply not emit
them. This could even be done as a peephole pass later, where it sees
consecutive atomic ops and the second being a no-op.

> On x86-64 (clang 16 -O2) we get:
> 
>   t0():
>           movl    $1, x(%rip)
>           movl    $1, %eax
>           xchgl   %eax, dummy(%rip)
>           mfence                    ;; Redundant with previous exchange.

And that's just terrible :/ Nobody should be using MFENCE for this. And
using MFENCE after a LOCK prefixes instruction (implicit in this case)
is just fail, because I don't think C++ atomics cover MMIO and other
such 'lovely' things.

>           movl    y(%rip), %eax
>           movl    %eax, r0(%rip)
>           retq
>   t1():
>           movl    $1, y(%rip)
>           mfence
>           movl    x(%rip), %eax
>           movl    %eax, r1(%rip)
>           retq