On Sun, Jul 30, 2023 at 10:33:08PM -0400, guoren@xxxxxxxxxx wrote: > From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx> > > The arch_spin_value_unlocked would cause an unnecessary memory > access to the contended value. Although it won't cause a significant > performance gap in most architectures, the arch_spin_value_unlocked > argument contains enough information. Thus, remove unnecessary > atomic_read in arch_spin_value_unlocked(). > > The caller of arch_spin_value_unlocked() could benefit from this > change. Currently, the only caller is lockref. > Have you verified you are getting an extra memory access from this in lockref? What architecture is it? I have no opinion about the patch itself, I will note though that the argument to the routine is *not* the actual memory-shared lockref, instead it's something from the local copy obtained with READ_ONCE from the real thing. So I would be surprised if the stock routine was generating accesses to that sucker. Nonetheless, if the patched routine adds nasty asm, that would be nice to sort out. FWIW on x86-64 qspinlock is used (i.e. not the stuff you are patching) and I verified there are only 2 memory accesses -- the initial READ_ONCE and later cmpxchg. I don't know which archs *don't* use qspinlock. It also turns out generated asm is quite atrocious and cleaning it up may yield a small win under more traffic. Maybe I'll see about it later this week. For example, disassembling lockref_put_return: <+0>: mov (%rdi),%rax <-- initial load, expected <+3>: mov $0x64,%r8d <+9>: mov %rax,%rdx <+12>: test %eax,%eax <-- retries loop back here <-- this is also the unlocked check <+14>: jne 0xffffffff8157aba3 <lockref_put_return+67> <+16>: mov %rdx,%rsi <+19>: mov %edx,%edx <+21>: sar $0x20,%rsi <+25>: lea -0x1(%rsi),%ecx <-- new.count--; <+28>: shl $0x20,%rcx <+32>: or %rcx,%rdx <+35>: test %esi,%esi <+37>: jle 0xffffffff8157aba3 <lockref_put_return+67> <+39>: lock cmpxchg %rdx,(%rdi) <-- the attempt to change <+44>: jne 0xffffffff8157ab9a <lockref_put_return+58> <+46>: shr $0x20,%rdx <+50>: mov %rdx,%rax <+53>: jmp 0xffffffff81af8540 <__x86_return_thunk> <+58>: mov %rax,%rdx <+61>: sub $0x1,%r8d <-- retry count check <+65>: jne 0xffffffff8157ab6c <lockref_put_return+12> <-- go back <+67>: mov $0xffffffff,%eax <+72>: jmp 0xffffffff81af8540 <__x86_return_thunk>