On Thu, Jan 10, 2008 at 07:02:39AM +0000, Sam Vilain wrote: > Sam Vilain wrote: > > I do really like LZOP as far as compression algorithms go. It seems a > > lot faster for not a huge loss in ratio. > > Coincidentally, I read this today on an algorithm (LZMA - same as 7zip) > which is very slow to compress, high ratio but quick decompression: > > http://use.perl.org/~acme/journal/35330 > > Which sounds excellent for squeezing those "archive packs" into even > more ridiculously tiny spaces. Well, lzma is excellent for *big* chunks of data, but not that impressive for small files: $ ll git.c git.c.gz git.c.lzma git.c.lzop -rw-r--r-- 1 madcoder madcoder 12915 2008-01-09 13:47 git.c -rw-r--r-- 1 madcoder madcoder 4225 2008-01-10 10:00 git.c.gz -rw-r--r-- 1 madcoder madcoder 4094 2008-01-10 10:00 git.c.lzma -rw-r--r-- 1 madcoder madcoder 5068 2008-01-10 09:59 git.c.lzop And lzma performs really bad if you have few memory available. The "big" secret of lzma is that it basically works with a huge window to check for repetitive data, and even decompression needs quite a fair amount of memory, making it a really bad choice for git IMNSHO. Though I don't agree with you (and some others) about the fact that gzip is fast enough. It's clearly a bottleneck in many log related commands where you would expect it to be rather IO bound than CPU bound. LZO seems like a fairer choice, especially since what it makes gain is basically the compression of the biggest blobs, aka the delta chains heads. It's really unclear to me if we really gain in compressing the deltas, trees, and other smallish informations. And when it comes to times, for a big file enough to give numbers, here are the decompression times (best of 10 runs, smaller is better, second number is the size of the packed data, original data was 7.8Mo): * lzma: 0.374s (2.2Mo) * gzip: 0.127s (2.9Mo) * lzop: 0.053s (3.2Mo) For a 300k original file: * lzma: 0.022s (124Ko) * gzip: 0.008s (144Ko) * lzop: 0.004s (156Ko) /* most of the samples were actually 0.005 */ What is obvious to me is that lzop seems to take 10% more space than gzip, while being around 1.5 to 2 times faster. Of course this is very sketchy and a real test with git will be better. -- ·O· Pierre Habouzit ··O madcoder@xxxxxxxxxx OOO http://www.madism.org
Attachment:
pgpQ8YSacRgLP.pgp
Description: PGP signature