Re: [PATCH] git gc: Speed it up by 18% via faster hash comparisons

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/28/2011 10:18 AM, Bernhard R. Link wrote:
> * Ralf Baechle<ralf@xxxxxxxxxxxxxx>  [110428 02:35]:
>> On Wed, Apr 27, 2011 at 04:32:12PM -0700, Junio C Hamano wrote:
>>
>>>> +static inline int is_null_sha1(const unsigned char *sha1)
>>>>   {
>>>> -	return memcmp(sha1, sha2, 20);
>>>> +	const unsigned long long *sha1_64 = (void *)sha1;
>>>> +	const unsigned int *sha1_32 = (void *)sha1;
>>>
>>> Can everybody do unaligned accesses just fine?
>>
>> Misaligned accesses cause exceptions on some architectures which then
>> are fixed up in software making these accesses _very_ slow.  You can
>> use __attribute__((packed)) to work around that but that will on the
>> affected architectures make gcc generate code pessimistically that is
>> slower than not using __attribute__((packed)) in case of proper
>> alignment.  And __attribute__((packed)) only works with GCC.
> 
> Even __attribute__((packed)) usually does not allow arbitrary aligned
> data, but can intruct the code to generate code to access code
> misaligned in a special way. (I have already seen code where thus
> accessing a properly aligned long caused a SIGBUS, because it was
> aligned because being in a misaligned packed struct).
> 
> In short: misaligning stuff works on x86, everywhere else it is disaster
> waiting to happen. (And people claiming compiler bugs or broken
> architectures, just because they do not know the basic rules of C).
> 

Given that the vast majority of user systems are x86 style ones, it's
probably worth using this patch on such systems and stick to a
partially unrolled byte-by-byte comparison that finishes early on
the rest of them. Properly pipelined, it will just mean that the early
return undoes the fetch steps for the 3-4 unrolled bytes that it
computes in advance, so if the diff comes in the first 10-12 bytes,
it will still be a win.

For bonus points, check if both bytestrings are equally (un)aligned
first and, if they are, half-Duff it out with a fallthrough switch
statement (without the while() loop) to compare byte-by-byte first
and then word-for-word on the rest of it. The setup and complexity
is probably not worth it for our meager 20-byte strings though.

-- 
Andreas Ericsson                   andreas.ericsson@xxxxxx
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]