Jakub Narebski <jnareb@xxxxxxxxx> wrote: > Nicolas Pitre <nico@xxxxxxx> writes: > > On Tue, 12 Aug 2008, Geert Bosch wrote: > > > > > One nice optimization we could do for those pesky binary large objects > > > (like PDF, JPG and GZIP-ed data), is to detect such files and revert > > > to compression level 0. This should be especially beneficial > > > since already compressed data takes most time to compress again. > > > > That would be a good thing indeed. > > Perhaps take a sample of some given size and calculate entropy in it? > Or just simply add gitattribute for per file compression ratio... Estimating the entropy would make it "just magic". Most of Git is "just magic" so that's a good direction to take. I'm not familiar enough with the PDF/JPG/GZIP/ZIP stream formats to know what the first 4-8k looks like to know if it would give a good indication of being already compressed. Though I'd imagine looking at the first 4k should be sufficient for any compressed file. Having a header composed of 4k of _text_ before binary compressed data would be nuts. Or a git-bundle with a large refs listing. ;-) Using a gitattribute inside of pack-objects is not "simple". We currently only support reading attributes from the working directory if I recall correctly. pack-objects may not have a working directory. Hence, "just magic" is probably the better route. -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html