Re: Why Git is so fast

Mike Hommey <mh@xxxxxxxxxxxx> · Fri, 1 May 2009 11:34:27 +0200

On Fri, May 01, 2009 at 11:19:04AM +0200, Kjetil Barvik wrote:
> * Steven Noonan <steven@xxxxxxxxxxxxxx> writes:
> | On Thu, Apr 30, 2009 at 2:36 PM, Kjetil Barvik <barvik@xxxxxxxxxxxx> wrote:
> |> * "Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes:
> |> |>      4) The "static inline void hashcpy(....)" in cache.h could then
> |> |>         maybe be written like this:
> |> |
> |> | Its already done as "memcpy(a, b, 20)" which most compilers will
> |> | inline and probably reduce to 5 word moves anyway.  That's why
> |> | hashcpy() itself is inline.
> |>
> |>  But would the compiler be able to trust that the hashcpy() is always
> |>  called with correct word alignment on variables a and b?
> 
>  <snipp>
> 
> | Well, I just tested this with GCC myself. I used this segment of code:
> |
> |         #include <memory.h>
> |         void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src)
> |         {
> |                 memcpy(sha_dst, sha_src, 20);
> |         }
> 
>   OK, here is a smal test, which maybe shows at least one difference
>   between using "unsigned char sha1[20]" and "unsigned long sha1[5]".
>   Given the following file, memcpy_test.c:
> 
> #include <string.h>
> extern void hashcpy_uchar(unsigned char *sha_dst, const unsigned char *sha_src);
> void hashcpy_uchar(unsigned char *sha_dst, const unsigned char *sha_src)
> {
>         memcpy(sha_dst, sha_src, 20);
> }
> extern void hashcpy_ulong(unsigned long *sha_dst, const unsigned long *sha_src);
> void hashcpy_ulong(unsigned long *sha_dst, const unsigned long *sha_src)
> {
>         memcpy(sha_dst, sha_src, 5);
> }
> 
>   And, compiled with the following:
> 
>     gcc -O2 -mtune=core2 -march=core2 -S -fomit-frame-pointer memcpy_test.c
> 
>   It produced the following memcpy_test.s file:
> 
>         .file   "memcpy_test.c"
>         .text
>         .p2align 4,,15
> .globl hashcpy_ulong
>         .type   hashcpy_ulong, @function
> hashcpy_ulong:
>         movl    8(%esp), %edx
>         movl    4(%esp), %ecx
>         movl    (%edx), %eax
>         movl    %eax, (%ecx)
>         movzbl  4(%edx), %eax
>         movb    %al, 4(%ecx)
>         ret
>         .size   hashcpy_ulong, .-hashcpy_ulong
>         .p2align 4,,15
> .globl hashcpy_uchar
>         .type   hashcpy_uchar, @function
> hashcpy_uchar:
>         movl    8(%esp), %edx
>         movl    4(%esp), %ecx
>         movl    (%edx), %eax
>         movl    %eax, (%ecx)
>         movl    4(%edx), %eax
>         movl    %eax, 4(%ecx)
>         movl    8(%edx), %eax
>         movl    %eax, 8(%ecx)
>         movl    12(%edx), %eax
>         movl    %eax, 12(%ecx)
>         movl    16(%edx), %eax
>         movl    %eax, 16(%ecx)
>         ret
>         .size   hashcpy_uchar, .-hashcpy_uchar
>         .ident  "GCC: (Gentoo 4.3.3-r2 p1.1, pie-10.1.5) 4.3.3"
>         .section        .note.GNU-stack,"",@progbits
> 
>   So, the "unsigned long" type hashcpy() used 7 instructions, compared
>   to 13 for the "unsigned char" type hascpy().

But your "unsigned long" version only copies 5 bytes...

Mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html