Re: [ANNOUNCE] Git v2.19.0-rc0

Jeff King <peff@xxxxxxxx> · Wed, 22 Aug 2018 22:16:18 -0400

On Wed, Aug 22, 2018 at 06:23:43PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> >> On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote:
> 
> >>>  static inline int hashcmp(const unsigned char *sha1, const unsigned
> >>> char *sha2)
> >>>  {
> >>> +       assert(the_hash_algo->rawsz == 20);
> >>>         return memcmp(sha1, sha2, the_hash_algo->rawsz);
> >>>  }
> [...]
> >              The bigger questions are:
> >
> >   - are we OK with such an assertion; and
> >
> >   - does the assertion still give us the desired behavior when we add in
> >     a branch for rawsz==32?
> >
> > And I think the answers for those are both "probably not".
> 
> At this point in the release process, I think the answer to the first
> question is a pretty clear "yes".
> 
> A ~10% increase in latency of some operations is quite significant, in
> exchange for no user benefit yet.  We can continue to try to figure
> out how to convince compilers to generate good code for this (and
> that's useful), but in the meantime we should also do the simple thing
> to avoid the regression for users.

FWIW, it's not 10%. The best I measured was ~4% on a very
hashcmp-limited operation, and I suspect even that may be highly
dependent on the compiler. We might be able to improve more by
sprinkling more asserts around, but there are 75 mentions of
the_hash_algo->rawsz. I wouldn't want to an assert at each one.

I don't mind doing one or a handful of these asserts as part of v2.19 if
we want to try to reclaim those few percent. But I suspect the very
first commit in any further hash-transition work is just going to be to
rip them all out.

-Peff