On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote: > int x = 0; > int y = 0; > int r0, r1; > > int dummy; > > void t0(void) > { > __atomic_store_n(&x, 1, __ATOMIC_RELAXED); > > __atomic_exchange_n(&dummy, 1, __ATOMIC_SEQ_CST); > __atomic_thread_fence(__ATOMIC_SEQ_CST); > > r0 = __atomic_load_n(&y, __ATOMIC_RELAXED); > } > > void t1(void) > { > __atomic_store_n(&y, 1, __ATOMIC_RELAXED); > __atomic_thread_fence(__ATOMIC_SEQ_CST); > r1 = __atomic_load_n(&x, __ATOMIC_RELAXED); > } > > // BUG_ON(r0 == 0 && r1 == 0) > > On x86-64 (gcc 13.1 -O2) we get: > > t0(): > movl $1, x(%rip) > movl $1, %eax > xchgl dummy(%rip), %eax > lock orq $0, (%rsp) ;; Redundant with previous exchange. > movl y(%rip), %eax > movl %eax, r0(%rip) > ret > t1(): > movl $1, y(%rip) > lock orq $0, (%rsp) > movl x(%rip), %eax > movl %eax, r1(%rip) > ret So I would expect the compilers to do better here. It should know those __atomic_thread_fence() thingies are superfluous and simply not emit them. This could even be done as a peephole pass later, where it sees consecutive atomic ops and the second being a no-op. > On x86-64 (clang 16 -O2) we get: > > t0(): > movl $1, x(%rip) > movl $1, %eax > xchgl %eax, dummy(%rip) > mfence ;; Redundant with previous exchange. And that's just terrible :/ Nobody should be using MFENCE for this. And using MFENCE after a LOCK prefixes instruction (implicit in this case) is just fail, because I don't think C++ atomics cover MMIO and other such 'lovely' things. > movl y(%rip), %eax > movl %eax, r0(%rip) > retq > t1(): > movl $1, y(%rip) > mfence > movl x(%rip), %eax > movl %eax, r1(%rip) > retq