Re: pack operation is thrashing my server

Nicolas Pitre <nico@xxxxxxx> · Wed, 13 Aug 2008 10:35:24 -0400 (EDT)

On Tue, 12 Aug 2008, Geert Bosch wrote:

> I've always felt that keeping largish objects (say anything >1MB)
> loose makes perfect sense. These objects are accessed infrequently,
> often binary or otherwise poor candidates for the delta algorithm.

Or, as I suggested in the past, they can be grouped into a separate 
pack, or even occupy a pack of their own.  As soon as you have more than 
one revision of such largish objects then you lose again by keeping them 
loose.

> Many repositories are mostly well-behaved with large number of text
> files that aren't overly large and compress/diff well. However, often
> a few huge files creep in. These might be a 30 MB Word or PDF documents
> (with lots of images of course), a bunch of artwork, some random .tgz files
> with required tools or otherwise.
> 
> Regardless of their origin, the presence of such files in real-world SCMs
> is a given and can ruin performance, even if they're hardly ever accessed
> or updated. If we would leave such oddball objects loose, the pack would
> be much smaller, easier to generate, faster to use and there should be no
> memory usage issues.

You'll have memory usage issues whenever such objects are accessed, 
loose or not.  However, once those big objects are packed once, they can 
be repacked (or streamed over the net) without really "accessing" them.  
Packed object data is simply copied into a new pack in that case which 
is less of an issue on memory usage, irrespective of the original pack 
size.

Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html