revisiting zstd timings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You may recall some "what if" timings I did recently, where I replaced
zlib with zstd:

  http://public-inbox.org/git/20160914235843.nacr54ekvl6rjipk@xxxxxxxxxxxxxxxxxxxxx/

The aim was that it would give us about the same compression level, but
much faster inflating and deflating. My numbers there showed that yes,
inflate is faster (which speeds up normal operations like traversals and
diffs), but that compression seemed to be much slower.

Since then, zstd has released a new version of their zlib wrapper which
performs much better for our case. Basically, the prior version's
transparent zstd setup was not optimized well for our case of doing tons
of small deflates. I'm cc-ing Przemyslaw Skibinski, who wrote the
wrapper code, and who pointed me in the right direction (thanks!).

I'll include the new patch at the end, but it's basically the same as
the old with one minor symbol change:

	diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
	index 7da2fd44b0..07f45aa171 100644
	--- a/builtin/pack-objects.c
	+++ b/builtin/pack-objects.c
	@@ -2287,7 +2287,7 @@ static int git_pack_config(const char *k, const char *v, void *cb)
	 			return config_error_nonbool(k);
	 #ifdef USE_ZSTD
	 		if (skip_prefix(v, "zstd", &v)) {
	-			useZSTD(1);
	+			ZWRAP_useZSTDcompression(1);
	 			/* XXX doesn't seem to be a constant? */
	 			pack_compression_max = 22;
	 		}

Here are the numbers I get with the newer version (see the prior email
for descriptions of each test, but I did run them all from scratch
again).

    zlib
     - pack
     1127899484
     - repack
     real    4m55.099s
     user    17m46.796s
     sys     0m6.164s
     - rev-list
     real    0m27.402s
     user    0m27.080s
     sys     0m0.320s
     - log -Sfoo
     real    5m16.326s
     user    5m13.860s
     sys     0m2.448s

That gives us a baseline for the time of each operation, and the size of
the compressed pack. The rest of the values will be expressed as
percentages from this baseline.

    zstd=disabled
     - pack
     1127884778
     (+0%)
     - repack
     real    4m49.443s
     user    17m39.468s
     sys     0m5.996s
     (-1.9%)
     - rev-list
     real    0m27.527s
     user    0m27.316s
     sys     0m0.208s
     (+0.4%)
     - log -Sfoo
     real    5m21.029s
     user    5m17.704s
     sys     0m3.308s
     (+1.4%)

Timing the wrapper making use of zlib, it produces the same output and
takes a slightly larger amount of time (the repack is quicker, but
there's quite a bit of run-to-run noise in that measurement). Nothing
surprising.

    zstd=1
     - pack
     1191357029
     (+5.6%)
     - repack
     real    4m8.323s
     user    16m24.664s
     sys     0m6.816s
     (-15.8%)
     - rev-list
     real    0m25.780s
     user    0m25.488s
     sys     0m0.288s
     (-5.9%)
     - log -Sfoo
     real    4m19.301s
     user    4m16.488s
     sys     0m2.796s
     (-18%)

Here's where we get interesting; zstd turned down to its lowest setting.
As before, the inflate step shows a nice speedup for some common
read-only operations. The repack is much faster, and the resulting pack
is slightly larger. We'd probably want to tune the cpu/size tradeoff for
deflate a bit higher.

    zstd=3
     - pack
     1163632679
     (+3.1%)
     - repack
     real    4m11.800s
     user    16m34.260s
     sys     0m6.788s
     (-14.6%)
     - rev-list
     real    0m25.678s
     user    0m25.376s
     sys     0m0.296s
     (-6.2%)
     - log -Sfoo
     real    4m19.902s
     user    4m16.604s
     sys     0m3.280s
     (-17.8%)

And here that is. Without spending much more CPU on deflate, we shaved
off a few percent. The reading side remains fast (I wondered if it might
get faster as we shrink the pack just because there are fewer bytes to
deal with on the input side, but it doesn't seem to really matter. And
anyway, we're only talking a few percent of the input size).

    zstd=5
     - pack
     1137697830
     (+0.8%)
     - repack
     real    4m20.930s
     user    16m45.844s
     sys     0m7.100s
     (-11.5%)

Turning it up higher, we're starting to reach parity with zlib for the
pack size. And the deflate process is still faster. I stopped measuring
the read side, since it seemed to be remaining constant.

    zstd=7
     - pack
     1130545314
     (+0.2%)
     - repack
     real    4m28.960s
     user    16m50.028s
     sys     0m6.884s
     (-8.8%)

    zstd=9
     - pack
     1128847500
     (+0%)
     - repack
     real    4m52.234s
     user    17m22.644s
     sys     0m7.080s
     (-0.9%)

    zstd=11
     - pack
     1128415787
     (+0%)
     - repack
     real    5m33.187s
     user    18m12.792s
     sys     0m7.436s
     (+12.9%)

And these numbers show we can pretty much dial in the cpu/size tradeoff
wherever we want.  Going past 9 doesn't seem to really help. I'd
probably leave it at 5, as 0.8% is close enough.

Note that this repack scenario _isn't_ especially realistic. It's
finding new deltas for the whole pack from scratch, which is something
that isn't generally done. Instead, we'd generally deflate a small
number of new objects during a repack, or perhaps when packing objects
to serve a fetch or push. And in that case, compression would be a much
smaller part of the whole task (because we'd spend a lot of our time
writing out the already-compressed objects verbatim).

So saving 10% here really _isn't_ that interesting. I mostly wanted to
confirm that we could use zstd without increasing the CPU time used for
deflating, so that we could reap the benefits on the inflate side. Which
is definitely the case. With these numbers, there's basically no
downside at all to using zstd. It's just faster to read the objects
later.

If we were designing git today, it seems like a no-brainer to use zstd
over zlib. But given backwards-compatibility issues, I'm not sure.
10-20% speedup on reading is awfully nice, but I don't think there's a
good way to gracefully transition, because zlib is part of the
on-the-wire format for serving objects. We could re-compress on the fly,
but that gets expensive (in existing cases, we can quite often serve the
zlib content straight from disk, but this would require an extra
inflate/deflate. At least we wouldn't have to reconstitute objects from
deltas, though).

A transition would probably look something like:

  0. The patch below, or something like it, to teach git to read both
     zlib and zstd, and optionally write zstd. We'd probably want to
     make this an unconditional requirement like zlib, because the point
     is for it to be available everywhere (I assume the zstd code is
     pretty portable, but I haven't put it to the test).

  1. Another patch to add a "zstd" capability to the protocol. This
     would require teaching pack-objects an option to convert zstd back
     to zlib on the fly.

     Servers which handle a limited number of updated clients can switch
     to zstd immediately to get the benefit, and their clients can
     handle it directly. Likewise, clients of older servers may wish to
     repack locally using zstd to get the benefit. They'll have to
     recompress on the fly during push, but pushes are rare than other
     operations (and often limited by bandwidth anyway).

  2. After a while, eventually flip the default to zstd=5.

  3. If "a while" is long enough, perhaps add a patch to let servers
     tell clients "go upgrade" rather than recompressing on the fly.

I don't have immediate plans for any of that, but maybe something to
think about.

Here's the patch which produce the numbers above. Build it with:

  make ZSTD_PATH=/path/to/zstd

where /path/to/zstd is a built clone of
https://github.com/facebook/zstd.

---
diff --git a/Makefile b/Makefile
index 1aad150b34..7e7c8c73e1 100644
--- a/Makefile
+++ b/Makefile
@@ -1140,6 +1140,15 @@ else
 endif
 IMAP_SEND_LDFLAGS += $(OPENSSL_LINK) $(OPENSSL_LIBSSL) $(LIB_4_CRYPTO)
 
+# for linking straight out of zstd build directory
+ifdef ZSTD_PATH
+	LIB_OBJS += $(ZSTD_PATH)/zlibWrapper/zstd_zlibwrapper.o
+	BASIC_CFLAGS += -I$(ZSTD_PATH)/zlibWrapper
+	BASIC_CFLAGS += -I$(ZSTD_PATH)/lib -I$(ZSTD_PATH)/lib/common
+	BASIC_CFLAGS += -DUSE_ZSTD
+	EXTLIBS += $(ZSTD_PATH)/lib/libzstd.a
+endif
+
 ifdef ZLIB_PATH
 	BASIC_CFLAGS += -I$(ZLIB_PATH)/include
 	EXTLIBS += -L$(ZLIB_PATH)/$(lib) $(CC_LD_DYNPATH)$(ZLIB_PATH)/$(lib)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 166e52c700..07f45aa171 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -62,6 +62,7 @@ static int num_preferred_base;
 static struct progress *progress_state;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
 static int pack_compression_seen;
+static int pack_compression_max = Z_BEST_COMPRESSION;
 
 static struct packed_git *reuse_packfile;
 static uint32_t reuse_packfile_objects;
@@ -2281,11 +2282,17 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 		return 0;
 	}
 	if (!strcmp(k, "pack.compression")) {
-		int level = git_config_int(k, v);
-		if (level == -1)
-			level = Z_DEFAULT_COMPRESSION;
-		else if (level < 0 || level > Z_BEST_COMPRESSION)
-			die("bad pack compression level %d", level);
+		int level;
+		if (!v)
+			return config_error_nonbool(k);
+#ifdef USE_ZSTD
+		if (skip_prefix(v, "zstd", &v)) {
+			ZWRAP_useZSTDcompression(1);
+			/* XXX doesn't seem to be a constant? */
+			pack_compression_max = 22;
+		}
+#endif
+		level = git_config_int(k, v);
 		pack_compression_level = level;
 		pack_compression_seen = 1;
 		return 0;
@@ -2826,7 +2833,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		reuse_delta = 0;
 	if (pack_compression_level == -1)
 		pack_compression_level = Z_DEFAULT_COMPRESSION;
-	else if (pack_compression_level < 0 || pack_compression_level > Z_BEST_COMPRESSION)
+	else if (pack_compression_level < 0 || pack_compression_level > pack_compression_max)
 		die("bad pack compression level %d", pack_compression_level);
 
 	if (!delta_search_threads)	/* --threads=0 means autodetect */
diff --git a/cache.h b/cache.h
index ed3d5dfce1..acc05720a0 100644
--- a/cache.h
+++ b/cache.h
@@ -37,7 +37,11 @@
 #define git_SHA1_Update		git_SHA1_Update_Chunked
 #endif
 
+#ifdef USE_ZSTD
+#include "zstd_zlibwrapper.h"
+#else
 #include <zlib.h>
+#endif
 typedef struct git_zstream {
 	z_stream z;
 	unsigned long avail_in;



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]