On Thu, 6 Aug 2009, Artur Skawina wrote: > > it's a bit slower (P4): > > before: linus 0.6288 97.06 > after: linus 0.6604 92.42 Hmm. Ok, I just tested with your harness, and I get # TIME[s] SPEED[MB/s] rfc3174 5.1 119.7 rfc3174 5.097 119.7 linus 1.836 332.5 linusas 2.006 304.3 linusas2 1.879 324.9 mozilla 5.562 109.7 mozillaas 5.913 103.2 openssl 1.613 378.5 spelvin 1.698 359.5 spelvina 1.602 381 nettle 1.594 382.9 with it, so it is faster for me. So your slowdown seems to be yet another P4 thing. Dang crazy micro-architecture. Of course, it might be a compiler version difference too. I'm using gcc-4.4.0. With the cpp variable renaming, the compiler really has less to be smart about, but spill decisions will still matter a lot. (My old 32-bit numbers were linus 2.092 291.8 so it's a clear improvement on my machine and with my compiler). It also seems to improve the 64-bit numbers a small bit, I'm getting # TIME[s] SPEED[MB/s] rfc3174 3.98 153.3 rfc3174 3.972 153.7 linus 1.514 403.1 linusas 1.555 392.6 linusas2 1.599 381.7 mozilla 4.34 140.6 mozillaas 4.223 144.5 with my 64-bit compile, so on a Nehalem it's the best one of the C ones by a noticeable margin. (My original 64-bit numbers were linus 1.54 396.3 and while the numbers seem to fluctuate a bit, the fluctuation is roughly in the 1% range, so that improvement seems to be statistically significant. Oh, I did make a small change, but I doubt it matters. Instead of doing TEMP += E + SHA_ROL(A,5) + (fn) + (constant); \ B = SHA_ROR(B, 2); E = TEMP; } while (0) I now do E += TEMP + SHA_ROL(A,5) + (fn) + (constant); \ B = SHA_ROR(B, 2); } while (0) which is a bit more logical (the old TEMP usage was just due to a fairly mindless conversion). That _might_ have lower register pressure if the compiler is silly enough to not notice that it can do it. Maybe that matters. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html