Re: Why Git is so fast

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Steven Noonan <steven@xxxxxxxxxxxxxx> writes:
| On Thu, Apr 30, 2009 at 2:36 PM, Kjetil Barvik <barvik@xxxxxxxxxxxx> wrote:
|> * "Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes:
|> |>      4) The "static inline void hashcpy(....)" in cache.h could then
|> |>         maybe be written like this:
|> |
|> | Its already done as "memcpy(a, b, 20)" which most compilers will
|> | inline and probably reduce to 5 word moves anyway.  That's why
|> | hashcpy() itself is inline.
|>
|>  But would the compiler be able to trust that the hashcpy() is always
|>  called with correct word alignment on variables a and b?

 <snipp>

| Well, I just tested this with GCC myself. I used this segment of code:
|
|         #include <memory.h>
|         void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src)
|         {
|                 memcpy(sha_dst, sha_src, 20);
|         }

  OK, here is a smal test, which maybe shows at least one difference
  between using "unsigned char sha1[20]" and "unsigned long sha1[5]".
  Given the following file, memcpy_test.c:

#include <string.h>
extern void hashcpy_uchar(unsigned char *sha_dst, const unsigned char *sha_src);
void hashcpy_uchar(unsigned char *sha_dst, const unsigned char *sha_src)
{
        memcpy(sha_dst, sha_src, 20);
}
extern void hashcpy_ulong(unsigned long *sha_dst, const unsigned long *sha_src);
void hashcpy_ulong(unsigned long *sha_dst, const unsigned long *sha_src)
{
        memcpy(sha_dst, sha_src, 5);
}

  And, compiled with the following:

    gcc -O2 -mtune=core2 -march=core2 -S -fomit-frame-pointer memcpy_test.c

  It produced the following memcpy_test.s file:

        .file   "memcpy_test.c"
        .text
        .p2align 4,,15
.globl hashcpy_ulong
        .type   hashcpy_ulong, @function
hashcpy_ulong:
        movl    8(%esp), %edx
        movl    4(%esp), %ecx
        movl    (%edx), %eax
        movl    %eax, (%ecx)
        movzbl  4(%edx), %eax
        movb    %al, 4(%ecx)
        ret
        .size   hashcpy_ulong, .-hashcpy_ulong
        .p2align 4,,15
.globl hashcpy_uchar
        .type   hashcpy_uchar, @function
hashcpy_uchar:
        movl    8(%esp), %edx
        movl    4(%esp), %ecx
        movl    (%edx), %eax
        movl    %eax, (%ecx)
        movl    4(%edx), %eax
        movl    %eax, 4(%ecx)
        movl    8(%edx), %eax
        movl    %eax, 8(%ecx)
        movl    12(%edx), %eax
        movl    %eax, 12(%ecx)
        movl    16(%edx), %eax
        movl    %eax, 16(%ecx)
        ret
        .size   hashcpy_uchar, .-hashcpy_uchar
        .ident  "GCC: (Gentoo 4.3.3-r2 p1.1, pie-10.1.5) 4.3.3"
        .section        .note.GNU-stack,"",@progbits

  So, the "unsigned long" type hashcpy() used 7 instructions, compared
  to 13 for the "unsigned char" type hascpy().

  Would I guess correct if the hashcpy_ulong() function will also use
  less CPU cycles, and then would be faster than hashcpy_uchar()?

  -- kjetil
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]