Linus Torvalds wrote: > > On Fri, 11 Jan 2008, Sam Vilain wrote: >> The difference seems only barely measurable; > > Ok. > > It may be that it might help other cases, but that seems unlikely. > > The more likely answer is that it's either of: > > - yes, zlib uncompression is noticeable in profiles, but that the > cold-cache access is simply the bigger problem, and getting rid of zlib > just moves the expense to whatever other thing that needs to access it > (memcpy, xdelta apply, whatever) > > or > > - I don't know exactly which patch you used (did you just do the > "core.deltacompression=0" thing?), and maybe zlib is fairly expensive > even for just the setup crud, even when it doesn't really need to be. > > but who knows.. Well, my figures agree with Pierre I think - 6-10% time savings for 'git annotate'. I think Pierre has hit the nail on the head - that skipping compression for small objects is a clear win. He saw the obvious criterion, really. I've knocked it up as a config option that doesn't change the default behaviour below. I can't help but speculate what benefits having a range of one or two of the most elite compression algorithms (eg, lzop or even lzma for the larger blobs) available would be, in general. eg, if gzip takes a stream longer than X kb to offer substantial benefits over lzop, lzop the ones shorter than that. If the uncompressed objects are clustered in the pack, then they might stream compress a lot better, should they be tranmitted over a http transport with gzip encoding. In packs which should be as small as possible, with a format change they could be distributed as one compressed resource. The ordering of the objects would ideally be selected such that it results in optimum compression - which could add a savings akin to bzip2 vs gzip, at the expense of having to scan the small objects for mini-deltas and arrange them clustering objects which share these mini-deltas. Well, interesting ideas anyway :) Subject: [PATCH] pack-objects: add compressionMinSize option Objects smaller than a page don't save much space when compressed, and cause some overhead. Allow the user to specify a minimum size for objects before they are compressed. Credit: Pierre Habouzit <madcoder@xxxxxxxxxx> Signed-off-by: Sam Vilain <sam.vilain@xxxxxxxxxxxxxxx> --- Documentation/config.txt | 5 +++++ builtin-pack-objects.c | 7 ++++++- 2 files changed, 11 insertions(+), 1 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 1b6d6d6..245121e 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -734,6 +734,11 @@ pack.compression:: compromise between speed and compression (currently equivalent to level 6)." +pack.compressionMinSize:: + Objects smaller than this are not compressed. This can make + operations that deal with many small objects (such as log) + faster. + pack.deltaCacheSize:: The maximum memory in bytes used for caching deltas in linkgit:git-pack-objects[1]. diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index a39cb82..316b809 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -76,6 +76,7 @@ static int num_preferred_base; static struct progress *progress_state; static int pack_compression_level = Z_DEFAULT_COMPRESSION; static int pack_compression_seen; +static int compression_min_size = 0; static unsigned long delta_cache_size = 0; static unsigned long max_delta_cache_size = 0; @@ -433,7 +434,7 @@ static unsigned long write_object(struct sha1file *f, } /* compress the data to store and put compressed length in datalen */ memset(&stream, 0, sizeof(stream)); - deflateInit(&stream, pack_compression_level); + deflateInit(&stream, size >= compression_min_size ? pack_compression_level : 0); maxsize = deflateBound(&stream, size); out = xmalloc(maxsize); /* Compress it */ @@ -1841,6 +1842,10 @@ static int git_pack_config(const char *k, const char *v) pack_compression_seen = 1; return 0; } + if (!strcmp(k, "pack.compressionminsize")) { + compression_min_size = git_config_int(k, v); + return 0; + } if (!strcmp(k, "pack.deltacachesize")) { max_delta_cache_size = git_config_int(k, v); return 0; -- 1.5.3.7.2095.gb2448-dirty - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html