Shawn O. Pearce wrote: >> On that note, has any thought been given to looking at other compression >> algorithms? Gzip is a great high-speed compressor, but there are others >> out there (some a bit slower, some much slower at both compression and >> decompression) that produce substantially smaller output. >> > Its been discussed once before on the list, in very recent history, > but not by a whole lot. As Junio pointed out, I don't think there > ever really was any discussion of is gzip the best way to deflate the > objects. I think gzip was just chosen simply because it was readily > available in libz, stable, and has a pretty decent speed/size ratio. > I think it's the right tool. I just don't see any point in changing to anything slower for the sake of 20% space saving. Especially bzip2. Consider this. Compression works primarily through two things: huffman coding and string matching. The larger the window for your string matching, the slower the compression and the more memory you need thrashing your CPU memory cache when decompressing. Now I'm not an expert on compression algorithms but I think a large part of the reason gzip is blindingly faster than bzip2 is because gzip uses a 64k buffer and bzip2 a 900k one. Only now are CPUs getting caches large enough to deal with that size of buffer, the rest of the time you're waiting for your RAM. Moore's law was supposed to make bzip2 fast one of these days but I'm still waiting. But with git-repack the window is effectively the size of your repository. So that blows bzip2 out of the water. Why else can git make compressed packs smaller than a .bz2 of the raw files? This is the same observation Shawn makes with the pack-wide dictionary, but he sounds like he wants to apply it to the huffman coding stage as well as the current delta/string matching stage. Now that would be interesting... Anyway it's a free world so be my guest to implement it, I guess if this was selectable it would only be a minor annoyance waiting a bit longer pulling from from some repositories, and it would be interesting to see if it did make a big difference with pack file sizes. Sam - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html