On Wed, Sep 14, 2016 at 11:47 PM, Jeff King <peff@xxxxxxxx> wrote: > On Wed, Sep 14, 2016 at 07:01:41PM -0700, Stefan Beller wrote: > >> According to Jeff, sending patches that don't get accepted is the new hype! > > It is what all the cool kids are doing. Unfortunately, it does not save > you from nitpicky reviews...;) > >> first = i = hash_obj(sha1, obj_hash_size); >> + clock_gettime(CLOCK_MONOTONIC, &time1); >> while ((obj = obj_hash[i]) != NULL) { >> if (!hashcmp(sha1, obj->oid.hash)) >> break; >> @@ -98,6 +131,9 @@ struct object *lookup_object(const unsigned char *sha1) >> if (i == obj_hash_size) >> i = 0; >> } >> + clock_gettime(CLOCK_MONOTONIC, &time2); >> + diff(&time1, &time2, &t_diff); >> + add_time_to(&caching, &t_diff); >> if (obj && i != first) { > > I don't think this is actually measuring the time spent on collisions. > It's measuring the time we spend in hashcmp(), but that includes the > non-collision case where we find it in the first hashcmp. Right. I measured all lookup times, i.e. this function accounts for 1/3 of the run time. > > Measuring _just_ the collisions is more like the patch below. In my > measurements it's more like 30ms, compared to 10s for all of the > hashcmps. So off by a few orders of magnitude, i.e. we don't have to worry about the time spent in resolving collisions. > > So we really aren't dealing with collisions, but rather just verifying > that our hash landed at the right spot. And _any_ data structure is > going to have to do that. If you want to make it faster, I'd try > optimizing hashcmp (and you can see why the critbit tree was slower; if > we spend so much time just on hashcmp() to make sure we're at the right > key, then making that slower with a bunch of branching is not going to > help). > > I notice we still open-code hashcmp. I get a slight speedup by switching > it to memcmp(). About 2.5%, which is similar to what I showed in > > http://public-inbox.org/git/20130318073229.GA5551@xxxxxxxxxxxxxxxxxxxxx/ > > a few years ago (though it's more pronounced as a portion of the whole > now, because we've optimized some of the other bits). > > The main driver there was memcmp() improvements that went into glibc > 2.13 several years ago. It might be time to start assuming that memcmp() > beats our open-coded loop. http://stackoverflow.com/questions/21106801/why-is-memcmp-so-much-faster-than-a-for-loop-check seems to agree with you; so I'd think I'll agree with switching over. > > It may also be possible to really micro-optimize it on some platforms, > because we know the size in advance (I'd kind of expect the compiler to > do that, but if we're ending up in glibc memcmp then it sounds like it > is not the case). That stackoverflow link suggests that glibc already has microoptimisations for a variety of platforms. > > -Peff