Re: x86 SHA1: Faster than OpenSSL

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 5 Aug 2009 21:27:05 -0700 (PDT)

On Thu, 6 Aug 2009, Artur Skawina wrote:
> > 
> > The way it's written, I can easily make it do one or the other by just 
> > turning the macro inside a loop (and we can have a preprocessor flag to 
> > choose one or the other), but let me work on it a bit more first.
> 
> that's of course how i measured it.. :)

Well, with my "rolling 512-bit array" I can't do that easily any more.

Now it actually depends on the compiler being able to statically do that 
circular list calculation. If I were to turn it back into the chunks of 
loops, my new code would suck, because it would have all those nasty 
dynamic address calculations.

> I've only tested on p4 and there the winner so far is still:

Yeah, well, I refuse to touch that crappy micro-architecture any more. I 
complained to Intel people for years that their best CPU was only 
available as a laptop chip (Pentium-M), and I'm really happy to have 
gotten rid of all my horrid P4's.

(Ok, so it was great when the P4 ran at 2x the frequency of the 
competition, and then it smoked them all. Except on OS loads, where the P4 
exception handling took ten times longer than anything else).

So I'm a big biased against P4. 

I'll try it on my Atom's, though. They're pretty crappy CPU's, but they 
have a fairly good _reason_ to be crappy.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html