----- On Sep 29, 2021, at 10:54 AM, Linus Torvalds torvalds@xxxxxxxxxxxxxxxxxxxx wrote: > On Wed, Sep 29, 2021 at 7:47 AM Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >> >> And if there *is* need ("look, we have that same store in both the if- >> and the else-statement" or whatever), then say so, and state that >> thing. > > Side note: I'd also like the commit that introduces this to talk very > explicitly about the particular case that it is used doe and that it > fixes. No "this can happen". A "this happened, here's the _actual_ > wrong code generation, and look how this new ctrl_dep() macro fixes > it". > > When it's this subtle, I don't want theoretical arguments. I want > actual outstanding and real bugs. > > Because I get the feeling that there were very few actual realistic > examples of this, only made-up theoretical cases that wouldn't ever > really be found in real code. There is one particular scenario which concerns me in refcount_dec_and_test(). It relies on smp_acquire__after_ctrl_dep() to promote the control dependency to an acquire barrier on success. Because it is exposed within a static inline function, it hides the fact that control dependencies are being used under the hood. I have not identified a specific instance of oddly generated code, but this is an accident waiting to happen. If we take this simplified refcount code into godbolt.org and compile it for RISC-V rv64gc clang 12.0.1: #define RISCV_FENCE(p, s) \ __asm__ __volatile__ ("fence " #p "," #s : : : "memory") #define __smp_rmb() RISCV_FENCE(r,r) volatile int var1; volatile int refcount; static inline bool refcount_dec_and_test(void) { refcount--; if (refcount == 0) { __smp_rmb(); /* acquire after ctrl_dep */ return true; } return false; } void fct(void) { int x; if (refcount_dec_and_test()) { var1 = 0; return; } __smp_rmb(); var1 = 1; } We end up with: fct(): # @fct() lui a0, %hi(refcount) lw a1, %lo(refcount)(a0) addiw a1, a1, -1 sw a1, %lo(refcount)(a0) lw a0, %lo(refcount)(a0) fence r, r snez a0, a0 lui a1, %hi(var1) sw a0, %lo(var1)(a1) ret Which lacks the conditional branch, and where the "fence r,r" instruction does not properly order following stores after the refcount load. Adding ctrl_dep() around the refcount == 0 check fixes this: fct(): # @fct() lui a0, %hi(refcount) lw a1, %lo(refcount)(a0) addiw a1, a1, -1 sw a1, %lo(refcount)(a0) lw a0, %lo(refcount)(a0) beqz a0, .LBB0_2 fence r, r addi a0, zero, 1 j .LBB0_3 .LBB0_2: fence r, r mv a0, zero .LBB0_3: lui a1, %hi(var1) sw a0, %lo(var1)(a1) ret I admit that this is still a "made up" example, although it is similar to the actual implementation of refcount_dec_and_check(). But if we need to audit every user of this API for wrongly generated code, we may be looking for a needle in a haystack. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com