On Tue, Dec 7, 2021 at 12:28 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > Argh.. __atomic_add_fetch() != __atomic_fetch_add(); much confusion for > GCC having both. With the right primitive it becomes: > > movl $1, %eax > lock xaddl %eax, (%rdi) > testl %eax, %eax > je .L5 > js .L6 > > Which makes a whole lot more sense. Note that the above misses the case where the old value was MAX_INT and the result now became negative. That isn't a _problem_, of course. I think it's fine. But if you cared about it, you'd have to do something like > movl $1, %eax > lock xaddl %eax, (%rdi) > jl .L6 > testl %eax, %eax > je .L5 instead (I might have gotten that "jl" wrong, needs more testing. But if you don't care about the MAX_INT overflow and make the overflow boundary be the next increment, then just make it be one error case: > movl $1, %eax > lock xaddl %eax, (%rdi) > testl %eax, %eax > jle .L5 and then (if you absolutely have to distinguish them) you can test eax again in the slow path. Linus