Re: [PATCH 03/16] pack-objects: use a faster hash table

Jeff King <peff@xxxxxxxx> · Tue, 25 Jun 2013 22:14:17 -0400

On Tue, Jun 25, 2013 at 04:03:22PM +0200, Thomas Rast wrote:

> > The big win here, however, is in the massively reduced amount of hash
> > collisions (as you can see from the huge reduction of time spent in
> > `hashcmp` after the change). These greatly improved lookup times
> > will result critical once we implement the writing algorithm for bitmap
> > indxes in a later patch of this series.
> 
> Is that reduction in collisions purely because it uses quadratic
> probing, or is there some other magic trick involved?  Is the same also
> applicable to the other users of the "big" object hash table?  (I assume
> Peff has already tried applying it there, but I'm still curious...)

I haven't done any actual timings yet.

The general code is quite similar to our object.c hash table, with the
exception that it does quadratic probing.  I did try quadratic probing
on our object.c hash once and didn't see much improvement (similarly,
Junio tried cuckoo hashing, but the numbers were not that exciting).

It's possible that the hash table in pack-objects did not behave as well
as the one in object.c. It looks like we grow it when the table is 3/4
full, which is a little high (we grow at 1/2 in object.c).  Quadratic
probing should help when the hash table is close to full, so it would
probably help. However, I also note that khash keeps its hash tables
only half full, so that may be the real source of the performance
improvement.

So I suspect two things (but as I said, haven't verified):

  1. You could speed up pack-objects just by keeping the table half full
     rather than 3/4 full.

  2. You would see little to no speedup by moving object.c to khash, as
     it is adding only quadratic probing. With quadratic probing, you
     could potentially tweak the kh_put_* to resize less aggressively
     (say, 2/3) and save some memory without loss of performance.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html