On Sun, Nov 13, 2011 at 08:01:50PM -0800, Junio C Hamano wrote: > * jc/lookup-object-hash (2011-08-11) 6 commits > - object hash: replace linear probing with 4-way cuckoo hashing > - object hash: we know the table size is a power of two > - object hash: next_size() helper for readability > - pack-objects --count-only > - object.c: remove duplicated code for object hashing > - object.c: code movement for readability > > I do not think there is anything fundamentally wrong with this series, but > the risk of breakage far outweighs observed performance gain in one > particular workload. FWIW, I finally got a chance to read through this series. It was fun, as I had not looked at cuckoo hashing before. However, the performance results were a bit underwhelming, and the code is more complex, which left me a bit negative. I also took a quick try at quadratic probing, which is only a few extra lines of code. I wasn't able to show any real performance improvement, though. I suspect it is because our hash table is not all that big, and we keep it pretty sparse, so linear probing does well. Googling around, it seems that linear probing performs well up to about 70% load factor, but there's surprisingly little theory behind it. I notice that the decorate.c hash keeps us below 2/3 full, but the object.c hash keeps us at 1/2. From my reading, that's just wasting space. Pushing the boundary up to 2/3 and trying your "--count-objects" on git.git, I don't see a big performance difference (with my change, the best-of-5 was a little better, but well within the noise). It does drop the maxresident by a few percent. So I don't think it's a big deal either way, but the code change is pretty trivial. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html