Re: What's cooking in git.git (Nov 2011, #03; Sun, 13)

Jeff King <peff@xxxxxxxx> · Mon, 14 Nov 2011 06:10:40 -0500

On Sun, Nov 13, 2011 at 08:01:50PM -0800, Junio C Hamano wrote:

> * jc/lookup-object-hash (2011-08-11) 6 commits
>  - object hash: replace linear probing with 4-way cuckoo hashing
>  - object hash: we know the table size is a power of two
>  - object hash: next_size() helper for readability
>  - pack-objects --count-only
>  - object.c: remove duplicated code for object hashing
>  - object.c: code movement for readability
> 
> I do not think there is anything fundamentally wrong with this series, but
> the risk of breakage far outweighs observed performance gain in one
> particular workload.

FWIW, I finally got a chance to read through this series. It was fun, as
I had not looked at cuckoo hashing before. However, the performance
results were a bit underwhelming, and the code is more complex, which
left me a bit negative. I also took a quick try at quadratic probing,
which is only a few extra lines of code.  I wasn't able to show any real
performance improvement, though.

I suspect it is because our hash table is not all that big, and we keep
it pretty sparse, so linear probing does well. Googling around, it seems
that linear probing performs well up to about 70% load factor, but
there's surprisingly little theory behind it.

I notice that the decorate.c hash keeps us below 2/3 full, but the
object.c hash keeps us at 1/2. From my reading, that's just wasting
space. Pushing the boundary up to 2/3 and trying your "--count-objects"
on git.git, I don't see a big performance difference (with my change,
the best-of-5 was a little better, but well within the noise). It does
drop the maxresident by a few percent.

So I don't think it's a big deal either way, but the code change is
pretty trivial.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html