Re: [PATCH] git gc: Speed it up by 18% via faster hash comparisons

Erik Faye-Lund <kusmabite@xxxxxxxxx> · Thu, 28 Apr 2011 11:55:57 +0200

On Thu, Apr 28, 2011 at 11:42 AM, Andreas Ericsson <ae@xxxxxx> wrote:
> On 04/28/2011 10:18 AM, Bernhard R. Link wrote:
>> * Ralf Baechle<ralf@xxxxxxxxxxxxxx>  [110428 02:35]:
>>> On Wed, Apr 27, 2011 at 04:32:12PM -0700, Junio C Hamano wrote:
>>>
>>>>> +static inline int is_null_sha1(const unsigned char *sha1)
>>>>>   {
>>>>> -  return memcmp(sha1, sha2, 20);
>>>>> +  const unsigned long long *sha1_64 = (void *)sha1;
>>>>> +  const unsigned int *sha1_32 = (void *)sha1;
>>>>
>>>> Can everybody do unaligned accesses just fine?
>>>
>>> Misaligned accesses cause exceptions on some architectures which then
>>> are fixed up in software making these accesses _very_ slow.  You can
>>> use __attribute__((packed)) to work around that but that will on the
>>> affected architectures make gcc generate code pessimistically that is
>>> slower than not using __attribute__((packed)) in case of proper
>>> alignment.  And __attribute__((packed)) only works with GCC.
>>
>> Even __attribute__((packed)) usually does not allow arbitrary aligned
>> data, but can intruct the code to generate code to access code
>> misaligned in a special way. (I have already seen code where thus
>> accessing a properly aligned long caused a SIGBUS, because it was
>> aligned because being in a misaligned packed struct).
>>
>> In short: misaligning stuff works on x86, everywhere else it is disaster
>> waiting to happen. (And people claiming compiler bugs or broken
>> architectures, just because they do not know the basic rules of C).
>>
>
> Given that the vast majority of user systems are x86 style ones, it's
> probably worth using this patch on such systems and stick to a
> partially unrolled byte-by-byte comparison that finishes early on
> the rest of them.

I disagree. We have no guarantee that the SHA-1s are aligned on x86
either, and unaligned accesses are slow on x86.

I think it's much much cleaner to add an early-out on the first byte,
and hope that memcmp is optimized properly. If it's not, those
platforms can add an override to memcmp in git-compat-util and/or
compat/*.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html