Re: [ANNOUNCE] Git v2.19.0-rc0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 21, 2018 at 05:29:24PM -0400, Jeff King wrote:
> 0001.2: rev-list --all --objects  37.07(36.62+0.45)   39.11(38.58+0.51) +5.5%
> 
> Less change, but my overall times were smaller, too, so clearly our
> hardware or exact repos are a little bit different. Those numbers seem
> pretty consistent in further runs.
> 
> It bisects to 509f6f62a4 (cache: update object ID functions for
> the_hash_algo, 2018-07-16). Which make sense. An "--objects" traversal
> spends a huge amount of time checking each tree entry to see if we've
> processed that object yet, which ends up as hashcmp() in the hash table.
> I expect that a fixed 20-byte memcmp() can be optimized a lot more than
> one with an arbitrary value.
> 
> Even if _we_ know the value can only take on one of a few values, I
> don't know that we have an easy way to tell the compiler that. Possibly
> we could improve things by jumping directly to an optimized code path.
> Sort of a poor-man's JIT. ;)
> 
> Doing this:
> 
> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..9c004a26c9 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,7 +1023,10 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>  {
> -	return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +	if (the_hash_algo->rawsz == 20)
> +		return memcmp(sha1, sha2, 20);
> +	else
> +		return memcmp(sha1, sha1, the_hash_algo->rawsz);
>  }
>  
>  static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
> on top of v2.19-rc0 seems to give me about a 3% speedup (though I might
> be imaging it, as there's a bit of noise). A function pointer in
> the_hash_algo might make even more sense.

It's possible that might be a better solution.  I looked into a GCC
assertion that the value was either 20 or 32, and that in itself didn't
seem to help, at least in the generated code.  Your solution is likely
better in that regard.

We could wire it up to be either 20 or 32 and let people experimenting
with other sizes of algorithms just add another branch.  I haven't
tested how that performs, though.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux