Re: [PATCH] vfs: Optimize dedupe comparison

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 15.07.21 г. 18:09, Matthew Wilcox wrote:
> On Thu, Jul 15, 2021 at 05:44:15PM +0300, Nikolay Borisov wrote:
>> That was my first impression, here's the profile:
>>
>>        │    Disassembly of section .text:
>>        │
>>        │    ffffffff815c6f60 <memcmp>:
>>        │    memcmp():
>>        │      test   %rdx,%rdx
>>        │    ↓ je     22
>>        │      xor    %ecx,%ecx
>>        │    ↓ jmp    12
>>  49.32 │ 9:   add    $0x1,%rcx
>>   0.03 │      cmp    %rcx,%rdx
>>  11.82 │    ↓ je     21
>>   0.01 │12:   movzbl (%rdi,%rcx,1),%eax
>>  38.19 │      movzbl (%rsi,%rcx,1),%r8d
>>   0.59 │      sub    %r8d,%eax
>>   0.04 │    ↑ je     9
> 
> That looks like a byte loop to me ...
> 
>> It's indeed on x86-64 and according to the sources it's using
>> __builtin_memcmp according to arch/x86/boot/string.h
> 
> I think the 'boot' part of that path might indicate that it's not what's
> actually being used by the kernel.
> 
> $ git grep __HAVE_ARCH_MEMCMP
> arch/arc/include/asm/string.h:#define __HAVE_ARCH_MEMCMP
> arch/arm64/include/asm/string.h:#define __HAVE_ARCH_MEMCMP
> arch/csky/abiv2/inc/abi/string.h:#define __HAVE_ARCH_MEMCMP
> arch/powerpc/include/asm/string.h:#define __HAVE_ARCH_MEMCMP
> arch/s390/include/asm/string.h:#define __HAVE_ARCH_MEMCMP       /* arch function */
> arch/s390/lib/string.c:#ifdef __HAVE_ARCH_MEMCMP
> arch/s390/purgatory/string.c:#define __HAVE_ARCH_MEMCMP /* arch function */
> arch/sparc/include/asm/string.h:#define __HAVE_ARCH_MEMCMP
> include/linux/string.h:#ifndef __HAVE_ARCH_MEMCMP
> lib/string.c:#ifndef __HAVE_ARCH_MEMCMP
> 
> So I think x86-64 is using the stupid one.
> 
>>> Can this even happen?  Surely we can only dedup on a block boundary and
>>> blocks are required to be a power of two and at least 512 bytes in size?
>>
>> I was wondering the same thing, but AFAICS it seems to be possible i.e
>> if userspace spaces bad offsets, while all kinds of internal fs
>> synchronization ops are going to be performed on aligned offsets, that
>> doesn't mean the original ones, passed from userspace are themselves
>> aligned explicitly.
> 
> Ah, I thought it'd be failed before we got to this point.
> 
> But honestly, I think x86-64 needs to be fixed to either use
> __builtin_memcmp() or to have a nicely written custom memcmp().  I
> tried to find the gcc implementation of __builtin_memcmp() on
> x86-64, but I can't.

__builtin_memcmp is a no go due to this function being an ifunc [0]
 and is being resolved to a call to memcmp which causes link => build
failures. So what remains is to either patch particular sites or as Dave
suggested have a generic optimized implementation.

glibc's implementation [1] seems straightforward enough to be
convertible to kernel style. However it would need definitive proof it
actually improves performance in a variety of scenarios.

[0] https://sourceware.org/glibc/wiki/GNU_IFUNC
[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=string/memcmp.c

> 



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux