Ingo Molnar wrote: > Most overhead is in hashcmp(), which uses memcmp(), which falls back to > assembly string operations. > > But we know that hashcmp() compares hashes, which if they do not match, the first byte > will differ in 99% of the cases. > > So i tried the patch below: instead of relying on GCC putting in the string > ops, i used an open-coded loop for this relatively short comparison, which does > not go beyond the first byte in 99% of the cases. [...] > --- a/cache.h > +++ b/cache.h > @@ -675,14 +675,33 @@ extern char *sha1_pack_name(const unsigned char *sha1); [...] > +static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) > { > - return memcmp(sha1, sha2, 20); > + int i; > + > + for (i = 0; i < 20; i++, sha1++, sha2++) { Hm. This would be very sensitive to the compiler, since a too-smart optimizer could take this loop and rewrite it back to memcmp! So I wonder if it's possible to convey this to the compiler more precisely: return memcmp_probably_differs_early(sha1, sha2, 20); E.g., how would something like const unsigned int *start1 = (const unsigned int *) sha1; const unsigned int *start2 = (const unsigned int *) sha2; if (likely(*start1 != *start2)) { if (*start1 < *start2) return -1; return +1; } return memcmp(sha1 + 4, sha2 + 4, 16); perform? I suspect we don't have to worry about endianness as long as hashcmp yields a consistent ordering, but I haven't checked. Thanks, that was interesting. Regards, Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html