Re: x86 SHA1: Faster than OpenSSL

"George Spelvin" <linux@xxxxxxxxxxx> · 6 Aug 2009 03:03:12 -0400

> On Thu, 6 Aug 2009, Artur Skawina wrote:
>> #             TIME[s] SPEED[MB/s]
>> rfc3174         1.357       44.99
>> rfc3174         1.352       45.13
>> mozilla         1.509       40.44
>> mozillaas       1.133       53.87
>> linus          0.5818       104.9

> #Initializing... Rounds: 1000000, size: 62500K, time: 1.421s, speed: 42.97MB/s
> #             TIME[s] SPEED[MB/s]
> rfc3174         1.403        43.5
> # New hash result: b747042d9f4f1fdabd2ac53076f8f830dea7fe0f
> rfc3174         1.403       43.51
> linus          0.5891       103.6
> linusas        0.5337       114.4
> mozilla         1.535       39.76
> mozillaas       1.128       54.13

I'm trying to absorb what you're learning about P4 performance, but
I'm getting confused... what is what in these benchmarks?

The major architectural decisions I see are:

1) Three possible ways to compute the W[] array for rounds 16..79:
	1a) Compute W[16..79] in a loop beforehand (you noted that unrolling
	    two copies helped significantly.)
	1b) Compute W[16..79] as part of hash rounds 16..79.
	1c) Compute W[0..15] in-place as part of hash rounds 16..79

2) The main hashing can be rolled up or unrolled:
	2a) Four 20-round loops.  (In case of options 1b and 1c, the
	    first one might be split into a 16 and a 4.)
	2b) Four 4-round loops, each unrolled 5x.  (See the ARM assembly.)
	2c) all 80 rounds unrolled.

As Linus noted, 1c is not friends with options 2a and 2b, because the
W() indexing math is not longer a compile-time constant.

Linus has posted 1a+2c and 1c+2c.  You posted some code that could be
2a or 2c depending on an UNROLL preprocessor #define.  Which combinations
are your "linus" and "linusas" code?

You talk about "and my atom seems to like the compact loops too", but
I'm not sure which loops those are.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html