Re: [PATCH V2] asm-generic: ticket-lock: Optimize arch_spin_value_unlocked

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jul 30, 2023 at 10:33:08PM -0400, guoren@xxxxxxxxxx wrote:
> From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
> 
> The arch_spin_value_unlocked would cause an unnecessary memory
> access to the contended value. Although it won't cause a significant
> performance gap in most architectures, the arch_spin_value_unlocked
> argument contains enough information. Thus, remove unnecessary
> atomic_read in arch_spin_value_unlocked().
> 
> The caller of arch_spin_value_unlocked() could benefit from this
> change. Currently, the only caller is lockref.
> 

Have you verified you are getting an extra memory access from this in
lockref? What architecture is it?

I have no opinion about the patch itself, I will note though that the
argument to the routine is *not* the actual memory-shared lockref,
instead it's something from the local copy obtained with READ_ONCE
from the real thing. So I would be surprised if the stock routine was
generating accesses to that sucker.

Nonetheless, if the patched routine adds nasty asm, that would be nice
to sort out.

FWIW on x86-64 qspinlock is used (i.e. not the stuff you are patching)
and I verified there are only 2 memory accesses -- the initial READ_ONCE
and later cmpxchg. I don't know which archs *don't* use qspinlock.

It also turns out generated asm is quite atrocious and cleaning it up
may yield a small win under more traffic. Maybe I'll see about it later
this week.

For example, disassembling lockref_put_return:
<+0>:     mov    (%rdi),%rax            <-- initial load, expected
<+3>:     mov    $0x64,%r8d
<+9>:     mov    %rax,%rdx
<+12>:    test   %eax,%eax              <-- retries loop back here
					<-- this is also the unlocked
					    check
<+14>:    jne    0xffffffff8157aba3 <lockref_put_return+67>
<+16>:    mov    %rdx,%rsi
<+19>:    mov    %edx,%edx
<+21>:    sar    $0x20,%rsi
<+25>:    lea    -0x1(%rsi),%ecx        <-- new.count--;
<+28>:    shl    $0x20,%rcx
<+32>:    or     %rcx,%rdx
<+35>:    test   %esi,%esi
<+37>:    jle    0xffffffff8157aba3 <lockref_put_return+67>
<+39>:    lock cmpxchg %rdx,(%rdi)      <-- the attempt to change
<+44>:    jne    0xffffffff8157ab9a <lockref_put_return+58>
<+46>:    shr    $0x20,%rdx
<+50>:    mov    %rdx,%rax
<+53>:    jmp    0xffffffff81af8540 <__x86_return_thunk>
<+58>:    mov    %rax,%rdx
<+61>:    sub    $0x1,%r8d              <-- retry count check
<+65>:    jne    0xffffffff8157ab6c <lockref_put_return+12> <-- go back
<+67>:    mov    $0xffffffff,%eax
<+72>:    jmp    0xffffffff81af8540 <__x86_return_thunk>




[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux