Re: [PATCH] vfs: Optimize dedupe comparison

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 15, 2021 at 05:44:15PM +0300, Nikolay Borisov wrote:
> That was my first impression, here's the profile:
> 
>        │    Disassembly of section .text:
>        │
>        │    ffffffff815c6f60 <memcmp>:
>        │    memcmp():
>        │      test   %rdx,%rdx
>        │    ↓ je     22
>        │      xor    %ecx,%ecx
>        │    ↓ jmp    12
>  49.32 │ 9:   add    $0x1,%rcx
>   0.03 │      cmp    %rcx,%rdx
>  11.82 │    ↓ je     21
>   0.01 │12:   movzbl (%rdi,%rcx,1),%eax
>  38.19 │      movzbl (%rsi,%rcx,1),%r8d
>   0.59 │      sub    %r8d,%eax
>   0.04 │    ↑ je     9

That looks like a byte loop to me ...

> It's indeed on x86-64 and according to the sources it's using
> __builtin_memcmp according to arch/x86/boot/string.h

I think the 'boot' part of that path might indicate that it's not what's
actually being used by the kernel.

$ git grep __HAVE_ARCH_MEMCMP
arch/arc/include/asm/string.h:#define __HAVE_ARCH_MEMCMP
arch/arm64/include/asm/string.h:#define __HAVE_ARCH_MEMCMP
arch/csky/abiv2/inc/abi/string.h:#define __HAVE_ARCH_MEMCMP
arch/powerpc/include/asm/string.h:#define __HAVE_ARCH_MEMCMP
arch/s390/include/asm/string.h:#define __HAVE_ARCH_MEMCMP       /* arch function */
arch/s390/lib/string.c:#ifdef __HAVE_ARCH_MEMCMP
arch/s390/purgatory/string.c:#define __HAVE_ARCH_MEMCMP /* arch function */
arch/sparc/include/asm/string.h:#define __HAVE_ARCH_MEMCMP
include/linux/string.h:#ifndef __HAVE_ARCH_MEMCMP
lib/string.c:#ifndef __HAVE_ARCH_MEMCMP

So I think x86-64 is using the stupid one.

> > Can this even happen?  Surely we can only dedup on a block boundary and
> > blocks are required to be a power of two and at least 512 bytes in size?
> 
> I was wondering the same thing, but AFAICS it seems to be possible i.e
> if userspace spaces bad offsets, while all kinds of internal fs
> synchronization ops are going to be performed on aligned offsets, that
> doesn't mean the original ones, passed from userspace are themselves
> aligned explicitly.

Ah, I thought it'd be failed before we got to this point.

But honestly, I think x86-64 needs to be fixed to either use
__builtin_memcmp() or to have a nicely written custom memcmp().  I
tried to find the gcc implementation of __builtin_memcmp() on
x86-64, but I can't.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux