On Tue, Dec 07, 2021 at 03:23:02PM -0800, Linus Torvalds wrote: > On Tue, Dec 7, 2021 at 12:28 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > Argh.. __atomic_add_fetch() != __atomic_fetch_add(); much confusion for > > GCC having both. With the right primitive it becomes: > > > > movl $1, %eax > > lock xaddl %eax, (%rdi) > > testl %eax, %eax > > je .L5 > > js .L6 > > > > Which makes a whole lot more sense. > > Note that the above misses the case where the old value was MAX_INT > and the result now became negative. > > That isn't a _problem_, of course. I think it's fine. But if you cared > about it, you'd have to do something like Hm.... > But if you don't care about the MAX_INT overflow and make the overflow > boundary be the next increment, then just make it be one error case: > > > movl $1, %eax > > lock xaddl %eax, (%rdi) > > testl %eax, %eax > > jle .L5 > > and then (if you absolutely have to distinguish them) you can test eax > again in the slow path. Suppose: inc(): overflow when old value is negative or zero dec(): overflow when new value is negative or zero That gives: inc(INT_MAX) is allowed dec(INT_MIN) is allowed IOW, the effective range becomes: [1..INT_MIN], which is a bit counter-intuitive, but then so is most of this stuff. Therefore can write this like: #define atomic_inc_ofl(v, label) do { int old = atomic_fetch_inc(v); if (unlikely(old <= 0)) goto label; } while (0) #define atomic_dec_ofl(v, label) do { int new = atomic_dec_return(v); if (unlikely(new <= 0)) goto label; } while (0) #define atomic_dec_and_test_ofl(v, label) ({ bool ret = false; int new = atomic_dec_return(&r->refs); if (unlikely(new < 0)) goto label; if (unlikely(new == 0) ret = true; ret; }) For a consistent set of primitives, right? Which already gives better code-gen than we have today. But that then also means we can write dec_ofl as: lock decl %[var] jle %l1 and dec_and_test_ofl() like: lock decl %[var] jl %l2 je %l[__zero] Lemme finisht the patches and send that out after dinner.