Re: pack operation is thrashing my server

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Wed, 13 Aug 2008 08:04:25 -0700

Jakub Narebski <jnareb@xxxxxxxxx> wrote:
> Nicolas Pitre <nico@xxxxxxx> writes:
> > On Tue, 12 Aug 2008, Geert Bosch wrote:
> > 
> > > One nice optimization we could do for those pesky binary large objects
> > > (like PDF, JPG and GZIP-ed data), is to detect such files and revert
> > > to compression level 0. This should be especially beneficial
> > > since already compressed data takes most time to compress again.
> > 
> > That would be a good thing indeed.
> 
> Perhaps take a sample of some given size and calculate entropy in it?
> Or just simply add gitattribute for per file compression ratio...

Estimating the entropy would make it "just magic".  Most of Git is
"just magic" so that's a good direction to take.  I'm not familiar
enough with the PDF/JPG/GZIP/ZIP stream formats to know what the
first 4-8k looks like to know if it would give a good indication
of being already compressed.

Though I'd imagine looking at the first 4k should be sufficient
for any compressed file.  Having a header composed of 4k of _text_
before binary compressed data would be nuts.  Or a git-bundle with
a large refs listing.  ;-)

Using a gitattribute inside of pack-objects is not "simple".
We currently only support reading attributes from the working
directory if I recall correctly.  pack-objects may not have a
working directory.

Hence, "just magic" is probably the better route.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html