Linus Torvalds wrote: > > On Thu, 6 Aug 2009, Artur Skawina wrote: >> # TIME[s] SPEED[MB/s] >> rfc3174 1.357 44.99 >> rfc3174 1.352 45.13 >> mozilla 1.509 40.44 >> mozillaas 1.133 53.87 >> linus 0.5818 104.9 >> >> so it's more than twice as fast as the mozilla implementation. > > So that's some general SHA1 benchmark you have? > > I hope it tests correctness too. yep, sort of, i just check that all versions return the same result when hashing some pseudorandom data. > As to my atom testing: my poor little atom is a sad little thing, and > it's almost painful to benchmark that thing. But it's worth it to look at > how the 32-bit code compares to the openssl asm code too: > > - BLK_SHA1: > real 2m27.160s > - OpenSSL: > real 2m12.580s > - Mozilla-SHA1: > real 3m21.836s > > As expected, the hand-tuned assembly does better (and by a bigger margin). > Probably partly because scheduling is important when in-order, and partly > because gcc will have a harder time with the small register set. > > But it's still a big improvement over mozilla one. > > (This is, as always, 'git fsck --full'. It spends about 50% on that SHA1 > calculation, so the SHA1 speedup is larger than you see from just th > enumbers) I'll start looking at other cpus once i integrate the asm versions into my benchmark. P4s really are "special". Even something as simple as this on top of your version: @@ -129,8 +133,8 @@ #define T_20_39(t) \ SHA_XOR(t); \ - TEMP += SHA_ROL(A,5) + (B^C^D) + E + 0x6ed9eba1; \ - E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP; + TEMP += SHA_ROL(A,5) + (B^C^D) + E; \ + E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP + 0x6ed9eba1; T_20_39(20); T_20_39(21); T_20_39(22); T_20_39(23); T_20_39(24); T_20_39(25); T_20_39(26); T_20_39(27); T_20_39(28); T_20_39(29); @@ -139,8 +143,8 @@ #define T_40_59(t) \ SHA_XOR(t); \ - TEMP += SHA_ROL(A,5) + ((B&C)|(D&(B|C))) + E + 0x8f1bbcdc; \ - E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP; + TEMP += SHA_ROL(A,5) + ((B&C)|(D&(B|C))) + E; \ + E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP + 0x8f1bbcdc; T_40_59(40); T_40_59(41); T_40_59(42); T_40_59(43); T_40_59(44); T_40_59(45); T_40_59(46); T_40_59(47); T_40_59(48); T_40_59(49); saves another 10% or so: #Initializing... Rounds: 1000000, size: 62500K, time: 1.421s, speed: 42.97MB/s # TIME[s] SPEED[MB/s] rfc3174 1.403 43.5 # New hash result: b747042d9f4f1fdabd2ac53076f8f830dea7fe0f rfc3174 1.403 43.51 linus 0.5891 103.6 linusas 0.5337 114.4 mozilla 1.535 39.76 mozillaas 1.128 54.13 artur -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html