Re: [PATCH 0/7] block-sha1: improved SHA1 hashing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds wrote:
> 
> On Thu, 6 Aug 2009, Artur Skawina wrote:
>> For those curious just how close the C version is to the various
>> asm and C implementations, the q&d microbenchmark is at 
>> http://www.src.multimo.pl/YDpqIo7Li27O0L0h/sha1bench.tar.gz
> 
> Hmm. That thing doesn't work at all on x86-64. Even apart from the asm 
> sources, your timing thing does soem really odd things (why do you do that 
> odd "iret" in GETCYCLES and GETTIME?). You're better off using 
> lfence/mfence/cpuid, and I think you could make it work on 64-bit that 
> way too.

yes, it's 32-bit only, i should have mentioned that. The timing
code was written more than a decade ago, it really works on p2,
haven't updated it, it's all just c&p'ed ever since. All of it
can be safely disabled; on p2 you could account for every cycle,
nowadays gettimeofday is more than enough.

> I just hacked it away for testing.
> 
>> In short: 88% of openssl speed on P3, 42% on P4, 66% on Atom.
> 
> I'll use this to see if I can improve the 32-bit case.
> 
> On Nehalem, with your benchmark, I get:
> 
> 	#             TIME[s] SPEED[MB/s]
> 	rfc3174         5.122       119.2
> 	# New hash result: d829b9e028e64840094ab6702f9acdf11bec3937
> 	rfc3174         5.153       118.5
> 	linus           2.092       291.8
> 	linusas         2.056       296.8
> 	linusas2        1.909       319.8
> 	mozilla         5.139       118.8
> 	mozillaas       5.775       105.7
> 	openssl         1.627       375.1
> 	spelvin         1.678       363.7
> 	spelvina        1.603       380.8
> 	nettle          1.592       383.4
> 
> And with the hacked version to get some 64-bit numbers:
> 
> 	#             TIME[s] SPEED[MB/s]
> 	rfc3174         3.992       152.9
> 	# New hash result: b78fd74c0033a4dfe0ededccb85ab00cb56880ab
> 	rfc3174         3.991       152.9
> 	linus            1.54       396.3
> 	linusas         1.533       398.1
> 	linusas2        1.603       380.9
> 	mozilla         4.352       140.3
> 	mozillaas       4.227       144.4
> 
> so as you can see, your improvements in 32-bit mode are actually 
> de-provements in 64-bit mode (ok, your first one seems to be a tiny 
> improvement, but I think it's in the noise).

Actually i didn't keep anything that wasn't a win, one reason
why linusas2 stayed was that it really surprised me, i'd have
expected for gcc to do a lot worse w/ the many temporaries and
the compiler came up w/ a 70% gain; gcc really must have improved
when i wasn't looking.

> But you're right, I need to try to improve the 32-bit case.

I never said anything like that. :) there probably isn't all that
much that can be done. I tried a few things, but never saw any 
improvement above measurement noise (a few percent). Would have
though that overlapping the iterations a bit would be a gain, but
that didn't do much (-20%..0), maybe on 64 bit, with more registers...

Oh, i noticed that '-mtune' makes quite a difference, it can change
the relative performance of the functions significantly, in unobvious
ways; depending on which cpu gcc tunes for (build config or -mtune);
some implementations slow down, others become a bit faster.

artur
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]