Re: [PATCH] hash: reduce size of algo member of object ID

Jeff King <peff@xxxxxxxx> · Mon, 4 Oct 2021 04:13:34 -0400

On Sun, Oct 03, 2021 at 07:51:40AM +0200, René Scharfe wrote:

> cf0983213c (hash: add an algo member to struct object_id, 2021-04-26)
> introduced the algo member as an int.  This increased the size of struct
> object_id by 4 bytes (12.5%) and imposed a 4-byte alignment.  Currently
> we only need to stored the values 0, 1 and 2 in it.  Let's use an
> unsigned char instead to reduce the size overhead and lift the alignment
> requirement.
> 
> Signed-off-by: René Scharfe <l.s.r@xxxxxx>
> ---
> Not sure how to measure the performance impact of this change.  The perf
> tests mentioned by cf0983213c don't show much of a difference with
> GIT_PERF_REPEAT_COUNT=10 for me:

Not surprising. If going from nothing to an int didn't have an impact on
those tests, then going from an int to char isn't likely to, either.

> 0001.1: rev-list --all                           0.11(0.08+0.02)     0.11(0.08+0.02) +0.0%
> 0001.2: rev-list --all --objects                 3.04(2.98+0.05)     3.04(2.98+0.05) +0.0%
> 0001.3: rev-list --parents                       0.05(0.04+0.01)     0.05(0.03+0.01) +0.0%
> 0001.5: rev-list -- dummy                        0.21(0.20+0.01)     0.21(0.19+0.01) +0.0%
> 0001.6: rev-list --parents -- dummy              0.22(0.20+0.01)     0.22(0.20+0.01) +0.0%
> 0001.8: rev-list $commit --not --all             0.06(0.05+0.00)     0.06(0.05+0.00) +0.0%
> 0001.9: rev-list --objects $commit --not --all   0.06(0.05+0.00)     0.06(0.05+0.00) +0.0%

I do think these probably aren't very good tests. They're going to
mostly be dealing with actual object structs. Your patch changes the
size of "struct object_id", but the object structs themselves still need
4-byte alignment. And they have a bunch of padding in the first place.

If we instrument "git version --build-options" like this:

diff --git a/help.c b/help.c
index 3c3bdec213..718e32cadf 100644
--- a/help.c
+++ b/help.c
@@ -11,6 +11,8 @@
 #include "version.h"
 #include "refs.h"
 #include "parse-options.h"
+#include "blob.h"
+#include "tag.h"
 
 struct category_description {
 	uint32_t category;
@@ -663,6 +665,11 @@ void get_version_info(struct strbuf *buf, int show_build_options)
 		strbuf_addf(buf, "sizeof-long: %d\n", (int)sizeof(long));
 		strbuf_addf(buf, "sizeof-size_t: %d\n", (int)sizeof(size_t));
 		strbuf_addf(buf, "shell-path: %s\n", SHELL_PATH);
+		strbuf_addf(buf, "sizeof-object_id: %d\n", (int)sizeof(struct object_id));
+		strbuf_addf(buf, "sizeof-commit: %d\n", (int)sizeof(struct commit));
+		strbuf_addf(buf, "sizeof-blob: %d\n", (int)sizeof(struct blob));
+		strbuf_addf(buf, "sizeof-tree: %d\n", (int)sizeof(struct tree));
+		strbuf_addf(buf, "sizeof-tag: %d\n", (int)sizeof(struct tag));
 		/* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */
 	}
 }

then we can see that your patch doesn't the change the size of any of
the structs at all! Even the original cf0983213c going from nothing to
an int changed only "struct blob" (it went from 32 to 36 bytes).

A more interesting test is something that stores a bunch of oids. A good
candidate is "cat-file --batch-all-objects". In the default mode, it
uses an oid_array to sort the set of objects. In unordered mode, it uses
an oidset to mark seen ones.

Here are hyperfine results. The three versions are:

  - none: cf0983213c^
  - int: cf0983213c
  - char: cf0983213c with the equivalent of your patch on top

All compiled with "gcc -O2", and run in a fully packed clone of
linux.git.

It looks like adding the "algo" field did make a big difference for the
oid_array case, but changing it to a char doesn't seem to help at all:

  $ hyperfine -L v none,int,char './git.{v} cat-file --batch-all-objects --batch-check="%(objectname)"'
  Benchmark #1: ./git.none cat-file --batch-all-objects --batch-check="%(objectname)"
    Time (mean ± σ):      1.653 s ±  0.009 s    [User: 1.607 s, System: 0.046 s]
    Range (min … max):    1.640 s …  1.670 s    10 runs
   
  Benchmark #2: ./git.int cat-file --batch-all-objects --batch-check="%(objectname)"
    Time (mean ± σ):      1.067 s ±  0.012 s    [User: 1.017 s, System: 0.050 s]
    Range (min … max):    1.053 s …  1.089 s    10 runs
   
  Benchmark #3: ./git.char cat-file --batch-all-objects --batch-check="%(objectname)"
    Time (mean ± σ):      1.092 s ±  0.013 s    [User: 1.046 s, System: 0.046 s]
    Range (min … max):    1.080 s …  1.116 s    10 runs
   
  Summary
    './git.int cat-file --batch-all-objects --batch-check="%(objectname)"' ran
      1.02 ± 0.02 times faster than './git.char cat-file --batch-all-objects --batch-check="%(objectname)"'
      1.55 ± 0.02 times faster than './git.none cat-file --batch-all-objects --batch-check="%(objectname)"'

I'm actually surprised it had this much of an impact. But I guess this
benchmark really is mostly just memcpy-ing oids into a big array,
sorting it, and printing the result. If that array is 12% bigger, we'd
expect at least a 12% speedup. But adding in non-linear elements like
growing the array (though I guess that is amortized linear) and sorting
(which picks up an extra log(n) term) make the difference.

It's _kind of_ silly in a sense, since usually you'd ask for other parts
of the object, which will make the speed difference relatively smaller.
But just dumping a bunch of oids is actually not an unreasonable thing
to do. I suspect it got a lot slower with 32-byte GIT_MAX_RAWSZ, too
(even when you're using 20-byte sha1), but I don't think there's an easy
way to get out of that.

Using --unordered switches us to using a khash-backed oidset. This
doesn't seem to show any meaningful difference in the three cases (the
averages follow the pattern I'd expect, but we're within the noise).

  $ hyperfine -L v none,int,char './git.{v} cat-file --batch-all-objects --unordered --batch-check="%(objectname)"'
  Benchmark #1: ./git.none cat-file --batch-all-objects --unordered --batch-check="%(objectname)"
    Time (mean ± σ):      2.125 s ±  0.024 s    [User: 2.071 s, System: 0.054 s]
    Range (min … max):    2.092 s …  2.182 s    10 runs
   
  Benchmark #2: ./git.int cat-file --batch-all-objects --unordered --batch-check="%(objectname)"
    Time (mean ± σ):      2.152 s ±  0.016 s    [User: 2.083 s, System: 0.068 s]
    Range (min … max):    2.123 s …  2.175 s    10 runs
   
  Benchmark #3: ./git.char cat-file --batch-all-objects --unordered --batch-check="%(objectname)"
    Time (mean ± σ):      2.137 s ±  0.013 s    [User: 2.079 s, System: 0.058 s]
    Range (min … max):    2.127 s …  2.167 s    10 runs
   
  Summary
    './git.none cat-file --batch-all-objects --unordered --batch-check="%(objectname)"' ran
      1.01 ± 0.01 times faster than './git.char cat-file --batch-all-objects --unordered --batch-check="%(objectname)"'
      1.01 ± 0.01 times faster than './git.int cat-file --batch-all-objects --unordered --batch-check="%(objectname)"'

What surprises me most here is that --unordered is so much slower than
its sorted cousin. It's purpose is to make things faster (though it
likely still does so for queries that actually look at the object
properties, since hitting them in pack order is beneficial there).

-Peff