Re: cleaner/better zlib sources?

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 16 Mar 2007 09:51:13 -0700 (PDT)

On Fri, 16 Mar 2007, Jeff Garzik wrote:
>
> Although it sounds like zlib could indeed be optimized to reduce its startup
> and shutdown overhead, I wonder if switching compression algorithms to a pure
> Huffman or even RLE compression (with associated lower startup/shutdown costs)
> would perform better in the face of all those small objects.

Well, the thing is, I personally much prefer to have just a single 
compression algorithm and object layout. Most of the performance-critical 
objects from a decompression standpoint during commit traversal are all 
small (especially if you do pathname limiting), but when you do something 
like a "git add ." most objects are actually random blob objects and you 
need to have a compression algorithm that works in the general case too.

Of course, pack-v4 may (likely will) end up using different strategies for 
different objects (delta's in particular), but the "one single object 
compression type" was a big deal for initial implementation.

It's may not be fundamental to git operation (so we can fairly easily 
change it and make it more complex without any higher-level stuff even 
noticing), but it was definitely fundamental to "get something stable and 
working" up and running quickly..

> And another random thought, though it may be useless in this thread:  I bet
> using a pre-built (compiled into git) static zlib dictionary for git commit
> and tree objects might improve things a bit.

That's kind of pack-v4 area. It will happen, but I'd actually like to see 
if we can just avoid stupid performance problems with zlib, independently 
of trying to make more tuned formats.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html