Re: cleaner/better zlib sources?

Jeff Garzik <jeff@xxxxxxxxxx> · Fri, 16 Mar 2007 12:35:39 -0400

Linus Torvalds wrote:
The normal size for the performance-critical git objects are in the couple 
of *hundred* bytes. Not kilobytes, and not megabytes.

The most performance-critical objects for uncompression are commits and 
trees. At least for the kernel, the average size of a tree object is 678
bytes. And that's ignoring the fact that most of them are then deltified, 
so about 80% of them are likely just a ~60-byte delta.

Ahhh.  At least for me, that explains a lot.  Rather than spending all 
its time in inflate_fast(), git is dealing with lots of zlib 
startup/shutdown overhead.

Although it sounds like zlib could indeed be optimized to reduce its 
startup and shutdown overhead, I wonder if switching compression 
algorithms to a pure Huffman or even RLE compression (with associated 
lower startup/shutdown costs) would perform better in the face of all 
those small objects.

And another random thought, though it may be useless in this thread:  I 
bet using a pre-built (compiled into git) static zlib dictionary for git 
commit and tree objects might improve things a bit.

	Jeff

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html