Re: [PATCH 0/7] block-sha1: improved SHA1 hashing

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 6 Aug 2009 13:53:11 -0700 (PDT)

On Thu, 6 Aug 2009, Artur Skawina wrote:
> 
> it's a bit slower (P4):
> 
> before: linus          0.6288       97.06
> after:  linus          0.6604       92.42

Hmm. Ok, I just tested with your harness, and I get

	#             TIME[s] SPEED[MB/s]
	rfc3174           5.1       119.7
	rfc3174         5.097       119.7
	linus           1.836       332.5
	linusas         2.006       304.3
	linusas2        1.879       324.9
	mozilla         5.562       109.7
	mozillaas       5.913       103.2
	openssl         1.613       378.5
	spelvin         1.698       359.5
	spelvina        1.602         381
	nettle          1.594       382.9

with it, so it is faster for me. So your slowdown seems to be yet another 
P4 thing. Dang crazy micro-architecture.

Of course, it might be a compiler version difference too. I'm using 
gcc-4.4.0.

With the cpp variable renaming, the compiler really has less to be smart 
about, but spill decisions will still matter a lot.

(My old 32-bit numbers were 

        linus           2.092       291.8

so it's a clear improvement on my machine and with my compiler).

It also seems to improve the 64-bit numbers a small bit, I'm getting

	#             TIME[s] SPEED[MB/s]
	rfc3174          3.98       153.3
	rfc3174         3.972       153.7
	linus           1.514       403.1
	linusas         1.555       392.6
	linusas2        1.599       381.7
	mozilla          4.34       140.6
	mozillaas       4.223       144.5

with my 64-bit compile, so on a Nehalem it's the best one of the C ones by 
a noticeable margin. (My original 64-bit numbers were

        linus            1.54       396.3

and while the numbers seem to fluctuate a bit, the fluctuation is roughly 
in the 1% range, so that improvement seems to be statistically 
significant.

Oh, I did make a small change, but I doubt it matters. Instead of doing

	TEMP += E + SHA_ROL(A,5) + (fn) + (constant); \
	B = SHA_ROR(B, 2); E = TEMP; } while (0)

I now do

	E += TEMP + SHA_ROL(A,5) + (fn) + (constant); \
	B = SHA_ROR(B, 2); } while (0)

which is a bit more logical (the old TEMP usage was just due to a fairly 
mindless conversion). That _might_ have lower register pressure if the 
compiler is silly enough to not notice that it can do it. Maybe that 
matters.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html