Re: [PATCH] object: measure time needed for resolving hash collisions

Stefan Beller <sbeller@xxxxxxxxxx> · Thu, 15 Sep 2016 09:26:22 -0700

On Wed, Sep 14, 2016 at 11:47 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Wed, Sep 14, 2016 at 07:01:41PM -0700, Stefan Beller wrote:
>
>>  According to Jeff, sending patches that don't get accepted is the new hype!
>
> It is what all the cool kids are doing. Unfortunately, it does not save
> you from nitpicky reviews...;)
>
>>       first = i = hash_obj(sha1, obj_hash_size);
>> +     clock_gettime(CLOCK_MONOTONIC, &time1);
>>       while ((obj = obj_hash[i]) != NULL) {
>>               if (!hashcmp(sha1, obj->oid.hash))
>>                       break;
>> @@ -98,6 +131,9 @@ struct object *lookup_object(const unsigned char *sha1)
>>               if (i == obj_hash_size)
>>                       i = 0;
>>       }
>> +     clock_gettime(CLOCK_MONOTONIC, &time2);
>> +     diff(&time1, &time2, &t_diff);
>> +     add_time_to(&caching, &t_diff);
>>       if (obj && i != first) {
>
> I don't think this is actually measuring the time spent on collisions.
> It's measuring the time we spend in hashcmp(), but that includes the
> non-collision case where we find it in the first hashcmp.

Right. I measured all lookup times, i.e. this function accounts for
1/3 of the run
time.

>
> Measuring _just_ the collisions is more like the patch below. In my
> measurements it's more like 30ms, compared to 10s for all of the
> hashcmps.

So off by a few orders of magnitude, i.e. we don't have to worry about
the time spent in resolving collisions.

>
> So we really aren't dealing with collisions, but rather just verifying
> that our hash landed at the right spot. And _any_ data structure is
> going to have to do that. If you want to make it faster, I'd try
> optimizing hashcmp (and you can see why the critbit tree was slower; if
> we spend so much time just on hashcmp() to make sure we're at the right
> key, then making that slower with a bunch of branching is not going to
> help).
>
> I notice we still open-code hashcmp. I get a slight speedup by switching
> it to memcmp(). About 2.5%, which is similar to what I showed in
>
>   http://public-inbox.org/git/20130318073229.GA5551@xxxxxxxxxxxxxxxxxxxxx/
>
> a few years ago (though it's more pronounced as a portion of the whole
> now, because we've optimized some of the other bits).
>
> The main driver there was memcmp() improvements that went into glibc
> 2.13 several years ago. It might be time to start assuming that memcmp()
> beats our open-coded loop.

http://stackoverflow.com/questions/21106801/why-is-memcmp-so-much-faster-than-a-for-loop-check

seems to agree with you; so I'd think I'll agree with switching over.

>
> It may also be possible to really micro-optimize it on some platforms,
> because we know the size in advance (I'd kind of expect the compiler to
> do that, but if we're ending up in glibc memcmp then it sounds like it
> is not the case).

That stackoverflow link suggests that glibc already has microoptimisations
for a variety of platforms.

>
> -Peff