Re: [PATCH] Add --no-reuse-delta option to git-gc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Shawn O. Pearce wrote:
>> On that note, has any thought been given to looking at other compression 
>> algorithms? Gzip is a great high-speed compressor, but there are others 
>> out there (some a bit slower, some much slower at both compression and 
>> decompression) that produce substantially smaller output.
>>     
> Its been discussed once before on the list, in very recent history,
> but not by a whole lot.  As Junio pointed out, I don't think there
> ever really was any discussion of is gzip the best way to deflate the
> objects.  I think gzip was just chosen simply because it was readily
> available in libz, stable, and has a pretty decent speed/size ratio.
>   

I think it's the right tool. I just don't see any point in changing to
anything slower for the sake of 20% space saving. Especially bzip2.

Consider this.

Compression works primarily through two things: huffman coding and
string matching. The larger the window for your string matching, the
slower the compression and the more memory you need thrashing your CPU
memory cache when decompressing.

Now I'm not an expert on compression algorithms but I think a large part
of the reason gzip is blindingly faster than bzip2 is because gzip uses
a 64k buffer and bzip2 a 900k one. Only now are CPUs getting caches
large enough to deal with that size of buffer, the rest of the time
you're waiting for your RAM. Moore's law was supposed to make bzip2 fast
one of these days but I'm still waiting.

But with git-repack the window is effectively the size of your
repository. So that blows bzip2 out of the water. Why else can git make
compressed packs smaller than a .bz2 of the raw files? This is the same
observation Shawn makes with the pack-wide dictionary, but he sounds
like he wants to apply it to the huffman coding stage as well as the
current delta/string matching stage. Now that would be interesting...

Anyway it's a free world so be my guest to implement it, I guess if this
was selectable it would only be a minor annoyance waiting a bit longer
pulling from from some repositories, and it would be interesting to see
if it did make a big difference with pack file sizes.

Sam
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux