Re: [ANNOUNCE] Git v2.19.0-rc0

Jeff King <peff@xxxxxxxx> · Thu, 23 Aug 2018 01:02:25 -0400

On Wed, Aug 22, 2018 at 07:27:56PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > FWIW, it's not 10%. The best I measured was ~4% on a very
> > hashcmp-limited operation, and I suspect even that may be highly
> > dependent on the compiler. We might be able to improve more by
> > sprinkling more asserts around, but there are 75 mentions of
> > the_hash_algo->rawsz. I wouldn't want to an assert at each one.
> >
> > I don't mind doing one or a handful of these asserts as part of v2.19 if
> > we want to try to reclaim those few percent. But I suspect the very
> > first commit in any further hash-transition work is just going to be to
> > rip them all out.
> 
> I was thinking just hashcmp and hashcpy.
> 
> Ideally such a change would come with a performance test to help the
> person writing that very first commit.  Except we already have
> performance tests that capture this. ;-)
> 
> For further hash-transition work, I agree someone may want to revert
> this, and I don't mind such a revert appearing right away in "next".
> And it's possible that we might have to do the equivalent of manual
> template expansion to recover the performance in some
> performance-sensitive areas.  Maybe we can get the compiler to
> cooperate with us in that and maybe we can't.  That's okay with me.
> 
> Anyway, I'll resend your patch with a commit message added some time
> this evening.

Here's the patch. For some reason my numbers aren't quite as large as
they were yesterday (I was very careful to keep the system unloaded
today, whereas yesterday I was doing a few other things, so perhaps that
is the difference).

-- >8 --
Subject: [PATCH] hashcmp: assert constant hash size

Prior to 509f6f62a4 (cache: update object ID functions for
the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a
constant size of 20 bytes. Some compilers were able to turn
that into a few quad-word comparisons, which is faster than
actually calling memcmp().

In 509f6f62a4, we started using the_hash_algo->rawsz
instead. Even though this will always be 20, the compiler
doesn't know that while inlining hashcmp() and ends up just
generating a call to memcmp().

Eventually we'll have to deal with multiple hash sizes, but
for the upcoming v2.19, we can restore some of the original
performance by asserting on the size. That gives the
compiler enough information to know that the memcmp will
always be called with a length of 20, and it performs the
same optimization.

Here are numbers for p0001.2 run against linux.git on a few
versions. This is using -O2 with gcc 8.2.0.

  Test     v2.18.0             v2.19.0-rc0               HEAD
  ------------------------------------------------------------------------------
  0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) -1.0%

You can see that v2.19 is a little slower than v2.18. This
commit ended up slightly faster than v2.18, but there's a
fair bit of run-to-run noise (the generated code in the two
cases is basically the same). This patch does seem to be
consistently 1-2% faster than v2.19.

I tried changing hashcpy(), which was also touched by
509f6f62a4, in the same way, but couldn't measure any
speedup. Which makes sense, at least for this workload. A
traversal of the whole commit graph requires looking up
every entry of every tree via lookup_object(). That's many
multiples of the numbers of objects in the repository (most
of the lookups just return "yes, we already saw that
object").

Reported-by: Derrick Stolee <stolee@xxxxxxxxx>
Signed-off-by: Jeff King <peff@xxxxxxxx>
---
 cache.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/cache.h b/cache.h
index b1fd3d58ab..4d014541ab 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,6 +1023,16 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
+	/*
+	 * This is a temporary optimization hack. By asserting the size here,
+	 * we let the compiler know that it's always going to be 20, which lets
+	 * it turn this fixed-size memcmp into a few inline instructions.
+	 *
+	 * This will need to be extended or ripped out when we learn about
+	 * hashes of different sizes.
+	 */
+	if (the_hash_algo->rawsz != 20)
+		BUG("hash size not yet supported by hashcmp");
 	return memcmp(sha1, sha2, the_hash_algo->rawsz);
 }
 
-- 
2.19.0.rc0.412.g7005db4e88