Artur Skawina wrote: > Artur Skawina wrote: > -#define setW(x, val) (*(volatile unsigned int *)&W(x) = (val)) > +#define setW(x, val) W(x) = (val); __asm__ volatile ("": "+m" (W(x))) and w/ this on top: diff --git a/block-sha1/sha1vol.c b/block-sha1/sha1vol.c --- a/block-sha1/sha1vol.c +++ b/block-sha1/sha1vol.c @@ -103,9 +103,9 @@ void blk_SHA1_Finalv(unsigned char hashout[20], blk_SHA_CTX *ctx) #define SHA_MIX(t) SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1) #define SHA_ROUND(t, input, fn, constant, A, B, C, D, E) do { \ - unsigned int TEMP = input(t); setW(t, TEMP); \ - E += TEMP + SHA_ROL(A,5) + (fn) + (constant); \ - B = SHA_ROR(B, 2); } while (0) + unsigned int TEMP = SHA_ROL(A,5); E+= (fn); \ + E += (constant) + TEMP; TEMP = input(t); setW(t, TEMP); \ + B = SHA_ROR(B, 2); E += TEMP; } while (0) #define T_0_15(t, A, B, C, D, E) SHA_ROUND(t, SHA_SRC, (((C^D)&B)^D) , 0x5a827999, A, B, C, D, E ) #define T_16_19(t, A, B, C, D, E) SHA_ROUND(t, SHA_MIX, (((C^D)&B)^D) , 0x5a827999, A, B, C, D, E ) I see an improvement on atom and reach ~200M/s on P4 (i686). . When compiled w/ '-mtune=prescott': rfc3174 1.459 41.84 linus 0.6574 92.85 linusph 0.6613 92.29 linusv 0.2682 227.6 linusvph 0.2681 227.7 linusasm 0.5868 104 linusp4 0.3586 170.2 linusas 0.3795 160.8 linusas2 0.3583 170.3 mozilla 1.171 52.11 mozillaas 1.381 44.2 openssl 0.2623 232.7 opensslb 0.2404 253.9 spelvin 0.2659 229.6 spelvina 0.2492 244.9 nettle 0.4362 139.9 nettle-ror 0.436 140 nettle-p4sch 0.4204 145.2 it's now just 2% slower than the openssl assembler version. artur -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html