Re: [PATCH] object: measure time needed for resolving hash collisions

Jeff King <peff@xxxxxxxx> · Wed, 14 Sep 2016 23:47:02 -0700

On Wed, Sep 14, 2016 at 07:01:41PM -0700, Stefan Beller wrote:

>  According to Jeff, sending patches that don't get accepted is the new hype!

It is what all the cool kids are doing. Unfortunately, it does not save
you from nitpicky reviews...;)

>  	first = i = hash_obj(sha1, obj_hash_size);
> +	clock_gettime(CLOCK_MONOTONIC, &time1);
>  	while ((obj = obj_hash[i]) != NULL) {
>  		if (!hashcmp(sha1, obj->oid.hash))
>  			break;
> @@ -98,6 +131,9 @@ struct object *lookup_object(const unsigned char *sha1)
>  		if (i == obj_hash_size)
>  			i = 0;
>  	}
> +	clock_gettime(CLOCK_MONOTONIC, &time2);
> +	diff(&time1, &time2, &t_diff);
> +	add_time_to(&caching, &t_diff);
>  	if (obj && i != first) {

I don't think this is actually measuring the time spent on collisions.
It's measuring the time we spend in hashcmp(), but that includes the
non-collision case where we find it in the first hashcmp.

Measuring _just_ the collisions is more like the patch below. In my
measurements it's more like 30ms, compared to 10s for all of the
hashcmps.

So we really aren't dealing with collisions, but rather just verifying
that our hash landed at the right spot. And _any_ data structure is
going to have to do that. If you want to make it faster, I'd try
optimizing hashcmp (and you can see why the critbit tree was slower; if
we spend so much time just on hashcmp() to make sure we're at the right
key, then making that slower with a bunch of branching is not going to
help).

I notice we still open-code hashcmp. I get a slight speedup by switching
it to memcmp(). About 2.5%, which is similar to what I showed in

  http://public-inbox.org/git/20130318073229.GA5551@xxxxxxxxxxxxxxxxxxxxx/

a few years ago (though it's more pronounced as a portion of the whole
now, because we've optimized some of the other bits).

The main driver there was memcmp() improvements that went into glibc
2.13 several years ago. It might be time to start assuming that memcmp()
beats our open-coded loop.

It may also be possible to really micro-optimize it on some platforms,
because we know the size in advance (I'd kind of expect the compiler to
do that, but if we're ending up in glibc memcmp then it sounds like it
is not the case).

-Peff