On Wed, Aug 22, 2018 at 6:49 PM Derrick Stolee <stolee@xxxxxxxxx> wrote: > > On 8/22/2018 12:26 PM, Jeff King wrote: > > On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote: > > > >> On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen <pclouds@xxxxxxxxx> wrote: > >>> On Wed, Aug 22, 2018 at 6:03 PM Jeff King <peff@xxxxxxxx> wrote: > >>>> On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote: > >>>> > >>>>> The other thing I was going to recommend (and I'll try to test this out > >>>>> myself later) is to see if 'the_hash_algo->rawsz' is being treated as a > >>>>> volatile variable, since it is being referenced through a pointer. Perhaps > >>>>> storing the value locally and then casing on it would help? > >>>> I tried various sprinkling of "const" around the declarations to make it > >>>> clear that the values wouldn't change once we saw them. But I couldn't > >>>> detect any difference. At most I think that would let us hoist the "if" > >>>> out of the loop, but gcc still seems unwilling to expand the memcmp when > >>>> there are other branches. > >>>> > >>>> I think if that's the thing we want to have happen, we really do need to > >>>> just write it out on that branch rather than saying "memcmp". > >>> This reminds me of an old discussion about memcpy() vs doing explicit > >>> compare loop with lots of performance measurements.. > >> Ah found it. Not sure if it is still relevant in light of multiple hash support > >> > >> https://public-inbox.org/git/20110427225114.GA16765@xxxxxxx/ > > Yes, that was what I meant. We actually did switch to that hand-rolled > > loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use > > memcmp instead of open-coded loop, 2017-08-09). > > Looking at that commit, I'm surprised the old logic was just a for loop, instead of a word-based approach, such as the following: Might work on x86 but it breaks on cpu architectures with stricter alignment. I don't think we have a guarantee that object_id is always 8 byte aligned. > > diff --git a/cache.h b/cache.h > index b1fd3d58ab..5e5819ad49 100644 > --- a/cache.h > +++ b/cache.h > @@ -1021,9 +1021,41 @@ extern int find_unique_abbrev_r(char *hex, const > struct object_id *oid, int len) > extern const unsigned char null_sha1[GIT_MAX_RAWSZ]; > extern const struct object_id null_oid; > > +static inline int word_cmp_32(uint32_t a, uint32_t b) > +{ > + return memcmp(&a, &b, sizeof(uint32_t)); > +} > + > +static inline int word_cmp_64(uint64_t a, uint64_t b) > +{ > + return memcmp(&a, &b, sizeof(uint64_t)); > +} > + > +struct object_id_20 { > + uint64_t data0; > + uint64_t data1; > + uint32_t data2; > +}; > + > static inline int hashcmp(const unsigned char *sha1, const unsigned > char *sha2) > { > - return memcmp(sha1, sha2, the_hash_algo->rawsz); > + if (the_hash_algo->rawsz == 20) { > + struct object_id_20 *obj1 = (struct object_id_20 *)sha1; > + struct object_id_20 *obj2 = (struct object_id_20 *)sha2; > + > + if (obj1->data0 == obj2->data0) { > + if (obj1->data1 == obj2->data1) { > + if (obj1->data2 == obj2->data2) { > + return 0; > + } > + return word_cmp_32(obj1->data2, > obj2->data2); > + } > + return word_cmp_64(obj1->data1, obj2->data1); > + } > + return word_cmp_64(obj1->data0, obj2->data0); > + } > + > + assert(0); > } > > static inline int oidcmp(const struct object_id *oid1, const struct > object_id *oid2) > > -- Duy