On Tue, Dec 07, 2021 at 02:28:22PM +0100, Peter Zijlstra wrote: > For refcount_inc(), as extracted from alloc_perf_context(), I get: > > 4b68: b8 01 00 00 00 mov $0x1,%eax > 4b6d: f0 0f c1 43 28 lock xadd %eax,0x28(%rbx) > 4b72: 85 c0 test %eax,%eax > 4b74: 74 1b je 4b91 <alloc_perf_context+0xf1> > 4b76: 8d 50 01 lea 0x1(%rax),%edx > 4b79: 09 c2 or %eax,%edx > 4b7b: 78 20 js 4b9d <alloc_perf_context+0xfd> > > the best I can seem to find is: https://godbolt.org/z/ne5o6eEEW Argh.. __atomic_add_fetch() != __atomic_fetch_add(); much confusion for GCC having both. With the right primitive it becomes: movl $1, %eax lock xaddl %eax, (%rdi) testl %eax, %eax je .L5 js .L6 Which makes a whole lot more sense.