* Steven Noonan <steven@xxxxxxxxxxxxxx> writes: | On Thu, Apr 30, 2009 at 2:36 PM, Kjetil Barvik <barvik@xxxxxxxxxxxx> wrote: |> * "Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes: |> |> 4) The "static inline void hashcpy(....)" in cache.h could then |> |> maybe be written like this: |> | |> | Its already done as "memcpy(a, b, 20)" which most compilers will |> | inline and probably reduce to 5 word moves anyway. That's why |> | hashcpy() itself is inline. |> |> But would the compiler be able to trust that the hashcpy() is always |> called with correct word alignment on variables a and b? <snipp> | Well, I just tested this with GCC myself. I used this segment of code: | | #include <memory.h> | void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src) | { | memcpy(sha_dst, sha_src, 20); | } OK, here is a smal test, which maybe shows at least one difference between using "unsigned char sha1[20]" and "unsigned long sha1[5]". Given the following file, memcpy_test.c: #include <string.h> extern void hashcpy_uchar(unsigned char *sha_dst, const unsigned char *sha_src); void hashcpy_uchar(unsigned char *sha_dst, const unsigned char *sha_src) { memcpy(sha_dst, sha_src, 20); } extern void hashcpy_ulong(unsigned long *sha_dst, const unsigned long *sha_src); void hashcpy_ulong(unsigned long *sha_dst, const unsigned long *sha_src) { memcpy(sha_dst, sha_src, 5); } And, compiled with the following: gcc -O2 -mtune=core2 -march=core2 -S -fomit-frame-pointer memcpy_test.c It produced the following memcpy_test.s file: .file "memcpy_test.c" .text .p2align 4,,15 .globl hashcpy_ulong .type hashcpy_ulong, @function hashcpy_ulong: movl 8(%esp), %edx movl 4(%esp), %ecx movl (%edx), %eax movl %eax, (%ecx) movzbl 4(%edx), %eax movb %al, 4(%ecx) ret .size hashcpy_ulong, .-hashcpy_ulong .p2align 4,,15 .globl hashcpy_uchar .type hashcpy_uchar, @function hashcpy_uchar: movl 8(%esp), %edx movl 4(%esp), %ecx movl (%edx), %eax movl %eax, (%ecx) movl 4(%edx), %eax movl %eax, 4(%ecx) movl 8(%edx), %eax movl %eax, 8(%ecx) movl 12(%edx), %eax movl %eax, 12(%ecx) movl 16(%edx), %eax movl %eax, 16(%ecx) ret .size hashcpy_uchar, .-hashcpy_uchar .ident "GCC: (Gentoo 4.3.3-r2 p1.1, pie-10.1.5) 4.3.3" .section .note.GNU-stack,"",@progbits So, the "unsigned long" type hashcpy() used 7 instructions, compared to 13 for the "unsigned char" type hascpy(). Would I guess correct if the hashcpy_ulong() function will also use less CPU cycles, and then would be faster than hashcpy_uchar()? -- kjetil -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html