Re: cleaner/better zlib sources?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 16, 2007 at 12:35:39PM -0400, Jeff Garzik wrote:
> Linus Torvalds wrote:
> >The normal size for the performance-critical git objects are in the couple 
> >of *hundred* bytes. Not kilobytes, and not megabytes.
> >
> >The most performance-critical objects for uncompression are commits and 
> >trees. At least for the kernel, the average size of a tree object is 678
> >bytes. And that's ignoring the fact that most of them are then deltified, 
> >so about 80% of them are likely just a ~60-byte delta.
> 
> 
> Ahhh.  At least for me, that explains a lot.  Rather than spending all 
> its time in inflate_fast(), git is dealing with lots of zlib 
> startup/shutdown overhead.
> 
> Although it sounds like zlib could indeed be optimized to reduce its 
> startup and shutdown overhead, I wonder if switching compression 
> algorithms to a pure Huffman or even RLE compression (with associated 
> lower startup/shutdown costs) would perform better in the face of all 
> those small objects.

Mercurial simply stores uncompressed objects below a threshold of 44
bytes, based on benchmarks I did in April 2005. I'd probably up that
number if I redid my measurements today. There's just not a whole lot
zlib can do at these small sizes. Given that a SHA hash is an
uncompressible 20 bytes already, you're well into the domain of
diminishing returns.

> And another random thought, though it may be useless in this thread:  I 
> bet using a pre-built (compiled into git) static zlib dictionary for git 
> commit and tree objects might improve things a bit.

Ideally, you'd compress all deltas in a chain with the same context.
You've got to decompress the delta base to do the delta
calculation, so this should allow you to recover the context up to
that point. Zlib isn't really set up for this sort of thing though.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]