Am 19.07.2014 01:57, schrieb Jeff King: > On Fri, Jul 18, 2014 at 09:14:05PM +0200, René Scharfe wrote: > >> If inlining is really better is another matter; I don't understand how >> 1a812f3a (hashcmp(): inline memcmp() by hand to optimize) could have made >> git gc 18% faster, as it claimed. I would expect memcmp(), which can >> compare more than a byte at a time, to be significantly faster -- or at >> least just as fast as whatever the compiler does with the inlined version. > > I looked into this a while ago[1]. I think with glibc 2.13 and up, the > memcmp is a win. We should consider switching back if that is what is > common now. > > -Peff > > [1] http://article.gmane.org/gmane.comp.version-control.git/218396 On Linux this seems to improve rev-list performance by ca. 10% for me, but on on FreeBSD it slows it down by ca. 25%. That's less surprising after looking at their memcmp() implementation: http://svnweb.freebsd.org/base/release/10.0.0/lib/libc/string/memcmp.c?revision=260789&view=co It's basically our one-byte-at-a-time inlined version, sans inlining. I'd say if a platform doesn't bother optimizing memcmp() then they deserve the resulting performance. And it's probably not too bad a penalty because such comparisons probably won't make up a significant part of most applications. But perhaps we can do better. First, here are the numbers I got for "/usr/bin/time git rev-list --all --objects v2.0.0 >/dev/null" (best of five runs): Debian testing amd64 with gcc 4.9.0 and libc 2.19 on Hyper-V on Intel i5-4670: master (398dd4bd) 2.14user 0.02system 0:02.16elapsed 99%CPU (0avgtext+0avgdata 71004maxresident)k 0inputs+0outputs (0major+21985minor)pagefaults 0swaps using memcmp 1.92user 0.02system 0:01.95elapsed 99%CPU (0avgtext+0avgdata 71032maxresident)k 0inputs+0outputs (0major+21987minor)pagefaults 0swaps four bytes at a time 1.93user 0.03system 0:01.97elapsed 99%CPU (0avgtext+0avgdata 71240maxresident)k 0inputs+0outputs (0major+22024minor)pagefaults 0swaps Debian testing amd64 with clang 3.3 and libc 2.19 on Hyper-V on Intel i5-4670: master (398dd4bd) 2.12user 0.04system 0:02.17elapsed 99%CPU (0avgtext+0avgdata 70976maxresident)k 0inputs+0outputs (0major+21995minor)pagefaults 0swaps memcmp 1.94user 0.01system 0:01.95elapsed 100%CPU (0avgtext+0avgdata 71012maxresident)k 0inputs+0outputs (0major+21952minor)pagefaults 0swaps four bytes at a time 1.86user 0.01system 0:01.88elapsed 99%CPU (0avgtext+0avgdata 70788maxresident)k 0inputs+0outputs (0major+21909minor)pagefaults 0swaps FreeBSD 10 amd64 with clang 3.3 on Hyper-V on Intel i5-4670c: master (398dd4bd) 1.83 real 1.80 user 0.03 sys using memcmp 2.32 real 2.32 user 0.00 sys four bytes at a time 1.56 real 1.53 user 0.03 sys Performance measurements in virtual machines are not very accurate, so take them with a bag of salt, but the numbers were relatively stable accross runs. How come rev-list runs so much faster on FreeBSD? Would it make sense to apply the four-bytes-at-a-time patch? How would it on other platforms, especially ARM? Can we do even better? And more importantly: Does it matter in the first place? The memcmp version just replaces the body of hashcmp() with "return memcmp(sha1, sha2, 20);". Here's the four-bytes-at-a-time patch: --- cache.h | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/cache.h b/cache.h index 92fc9f1..c6cd28e 100644 --- a/cache.h +++ b/cache.h @@ -732,13 +732,18 @@ extern const unsigned char null_sha1[20]; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { + const uint32_t *p1 = (const uint32_t *)sha1; + const uint32_t *p2 = (const uint32_t *)sha2; int i; - for (i = 0; i < 20; i++, sha1++, sha2++) { - if (*sha1 != *sha2) - return *sha1 - *sha2; + for (i = 0; i < 20 / sizeof(uint32_t); i++, p1++, p2++) { + uint32_t v1 = htonl(*p1); + uint32_t v2 = htonl(*p2); + if (v1 < v2) + return -1; + if (v1 > v2) + return 1; } - return 0; } -- 2.0.2 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html