Re: Decompression speed: zip vs lzo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds wrote:
> 
> On Fri, 11 Jan 2008, Sam Vilain wrote:
>> The difference seems only barely measurable;
> 
> Ok. 
> 
> It may be that it might help other cases, but that seems unlikely.
> 
> The more likely answer is that it's either of:
> 
>  - yes, zlib uncompression is noticeable in profiles, but that the 
>    cold-cache access is simply the bigger problem, and getting rid of zlib 
>    just moves the expense to whatever other thing that needs to access it 
>    (memcpy, xdelta apply, whatever)
> 
> or
> 
>  - I don't know exactly which patch you used (did you just do the 
>    "core.deltacompression=0" thing?), and maybe zlib is fairly expensive 
>    even for just the setup crud, even when it doesn't really need to be.
> 
> but who knows..

Well, my figures agree with Pierre I think - 6-10% time savings for
'git annotate'.

I think Pierre has hit the nail on the head - that skipping
compression for small objects is a clear win.  He saw the obvious
criterion, really.  I've knocked it up as a config option that doesn't
change the default behaviour below.

I can't help but speculate what benefits having a range of one or two
of the most elite compression algorithms (eg, lzop or even lzma for
the larger blobs) available would be, in general.  eg, if gzip takes a
stream longer than X kb to offer substantial benefits over lzop, lzop
the ones shorter than that.

If the uncompressed objects are clustered in the pack, then they might
stream compress a lot better, should they be tranmitted over a http
transport with gzip encoding.  In packs which should be as small as
possible, with a format change they could be distributed as one
compressed resource.  The ordering of the objects would ideally be
selected such that it results in optimum compression - which could add
a savings akin to bzip2 vs gzip, at the expense of having to scan the
small objects for mini-deltas and arrange them clustering objects
which share these mini-deltas.

Well, interesting ideas anyway :)

Subject: [PATCH] pack-objects: add compressionMinSize option

Objects smaller than a page don't save much space when compressed, and
cause some overhead.  Allow the user to specify a minimum size for
objects before they are compressed.

Credit: Pierre Habouzit <madcoder@xxxxxxxxxx>
Signed-off-by: Sam Vilain <sam.vilain@xxxxxxxxxxxxxxx>
---
 Documentation/config.txt |    5 +++++
 builtin-pack-objects.c   |    7 ++++++-
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1b6d6d6..245121e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -734,6 +734,11 @@ pack.compression::
 	compromise between speed and compression (currently equivalent
 	to level 6)."
 
+pack.compressionMinSize::
+	Objects smaller than this are not compressed.  This can make
+	operations that deal with many small objects (such as log)
+	faster.
+
 pack.deltaCacheSize::
 	The maximum memory in bytes used for caching deltas in
 	linkgit:git-pack-objects[1].
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index a39cb82..316b809 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -76,6 +76,7 @@ static int num_preferred_base;
 static struct progress *progress_state;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
 static int pack_compression_seen;
+static int compression_min_size = 0;
 
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 0;
@@ -433,7 +434,7 @@ static unsigned long write_object(struct sha1file *f,
 		}
 		/* compress the data to store and put compressed length in datalen */
 		memset(&stream, 0, sizeof(stream));
-		deflateInit(&stream, pack_compression_level);
+		deflateInit(&stream, size >= compression_min_size ? pack_compression_level : 0);
 		maxsize = deflateBound(&stream, size);
 		out = xmalloc(maxsize);
 		/* Compress it */
@@ -1841,6 +1842,10 @@ static int git_pack_config(const char *k, const char *v)
 		pack_compression_seen = 1;
 		return 0;
 	}
+	if (!strcmp(k, "pack.compressionminsize")) {
+		compression_min_size = git_config_int(k, v);
+		return 0;	
+	}
 	if (!strcmp(k, "pack.deltacachesize")) {
 		max_delta_cache_size = git_config_int(k, v);
 		return 0;
-- 
1.5.3.7.2095.gb2448-dirty

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux