Steven Grimm <koreth@xxxxxxxxxxxxx> wrote: > On that note, has any thought been given to looking at other compression > algorithms? Gzip is a great high-speed compressor, but there are others > out there (some a bit slower, some much slower at both compression and > decompression) that produce substantially smaller output. Its been discussed once before on the list, in very recent history, but not by a whole lot. As Junio pointed out, I don't think there ever really was any discussion of is gzip the best way to deflate the objects. I think gzip was just chosen simply because it was readily available in libz, stable, and has a pretty decent speed/size ratio. > I think it'd be kind of neat to have my .git directory shrink by another > 20+%. That's conservative; on maximumcompression.com's test of a mix of > different file types including images, gzip compresses 64% and the > best-scoring one does 80%. On English text gzip does 71% and the top > scorer does 89%. Most of the top-tier compressors are proprietary, but Yes. But in many cases we might actually be able to do even better by going with a pack-wide dictionary. Why? Think about source code structure. E.g. $ git grep --cached 'struct object'| cut -d: -f1|wc -l 402 So 402 files in git.git use the term 'struct object', and that's just the current revision I had in my index. With our current packfile organization we are likely to store this string at least 402 times. We'll store it once in each file's delta chain, assuming each file's blobs largely fall into a single delta chain for that file (reasonable assumption, but certainly not always true). That's just one string that does appear somewhat frequently in any file its used in. Now try 'unsigned char' (its 944 files, but an even higher frequency-per-file). So anyway, for the past year I've been thinking about trying to implement a blob-level dictionary prototype to see if it helps on a project like linux-2.6.git, but I haven't gotten to it. The pack v4 work was about applying that basic dicationary principal to trees and commits, and I think it pays off nicely there. Just need to get it cleaned up, rebased onto current master, and submitted to the list for wider testing. ;-) -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html