Re: [PATCH 08/15] cache: compare the entire buffer for struct object_id

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Sun, 11 Apr 2021 13:36:33 +0200

On Sat, Apr 10 2021, brian m. carlson wrote:

> Currently, when we compare two object IDs, we have to take a branch to
> determine what the hash size is supposed to be.  The compiler can
> optimize well for a single length, but has trouble when there are two
> possible lengths.

This would benefit from some performance/perf numbers. When this code
was first changed like this in 183a638b7da (hashcmp: assert constant
hash size, 2018-08-23) we had:

      Test     v2.18.0             v2.19.0-rc0               HEAD
      ------------------------------------------------------------------------------
      0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) -1.0%

Then it was later modified in 0dab7129ab1 (cache: make hashcmp and
hasheq work with larger hashes, 2018-11-14).

> @@ -205,7 +205,7 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>  
>  static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
>  {
> -	return hashcmp(oid1->hash, oid2->hash);
> +	return memcmp(oid1->hash, oid2->hash, GIT_MAX_RAWSZ);
>  }

hashcmp is now:

        if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
                return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
        return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);

Wouldn't it make more sense to amend it to just be a memcmp
wrapper/macro if we're going to not make this conditional on the hash
algorithm, or are there other callsites where we still want the old way
of doing it?

>  
>  static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
> @@ -221,7 +221,7 @@ static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
>  
>  static inline int oideq(const struct object_id *oid1, const struct object_id *oid2)
>  {
> -	return hasheq(oid1->hash, oid2->hash);
> +	return !memcmp(oid1->hash, oid2->hash, GIT_MAX_RAWSZ);
>  }

Ditto hasheq v.s. !memcmp:

        if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
                return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
        return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);