Linus Torvalds wrote:
The normal size for the performance-critical git objects are in the couple
of *hundred* bytes. Not kilobytes, and not megabytes.
The most performance-critical objects for uncompression are commits and
trees. At least for the kernel, the average size of a tree object is 678
bytes. And that's ignoring the fact that most of them are then deltified,
so about 80% of them are likely just a ~60-byte delta.
Ahhh. At least for me, that explains a lot. Rather than spending all
its time in inflate_fast(), git is dealing with lots of zlib
startup/shutdown overhead.
Although it sounds like zlib could indeed be optimized to reduce its
startup and shutdown overhead, I wonder if switching compression
algorithms to a pure Huffman or even RLE compression (with associated
lower startup/shutdown costs) would perform better in the face of all
those small objects.
And another random thought, though it may be useless in this thread: I
bet using a pre-built (compiled into git) static zlib dictionary for git
commit and tree objects might improve things a bit.
Jeff
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html