Linus Torvalds wrote: > > The bigger issue seems to be that it's shifter-limited, or that's what I > take away from my profiles. I suspect it's even _more_ shifter-limited on > some other micro-architectures, because gcc is being stupid, and generates > > ror $31,%eax > > from the "left shift + right shift" combination. It seems to -always- > generate a "ror", rather than trying to generate 'rot' if the shift count > would be smaller that way. > > And I know _some_ old micro-architectures will literally internally loop > on the rol/ror counts, so "ror $31" can be _much_ more expensive than "rol > $1". > > That isn't the case on my Nehalem, though. But I can't seem to get gcc to > generate better code without actually using inline asm.. The compiler does the right thing w/ something like this: +#if __GNUC__>1 && defined(__i386) +#define SHA_ROT(data,bits) ({ \ + unsigned d = (data); \ + if (bits<16) \ + __asm__ ("roll %1,%0" : "=r" (d) : "I" (bits), "0" (d)); \ + else \ + __asm__ ("rorl %1,%0" : "=r" (d) : "I" (32-bits), "0" (d)); \ + d; \ + }) +#else #define SHA_ROT(X,n) (((X) << (n)) | ((X) >> (32-(n)))) +#endif which doesn't obfuscate the code as much. (I needed the asm on p4 anyway, as w/o it the mozilla version is even slower than an rfc3174 one. rol vs ror makes no measurable difference) > static void blk_SHA1Block(blk_SHA_CTX *ctx, const unsigned int *data) > { > @@ -93,7 +105,7 @@ static void blk_SHA1Block(blk_SHA_CTX *ctx, const unsigned int *data) > > /* Unroll it? */ > for (t = 16; t <= 79; t++) > - W[t] = SHA_ROT(W[t-3] ^ W[t-8] ^ W[t-14] ^ W[t-16], 1); > + W[t] = SHA_ROL(W[t-3] ^ W[t-8] ^ W[t-14] ^ W[t-16], 1); unrolling this once (but not more) is a win, at least on p4. > #define T_0_19(t) \ > - TEMP = SHA_ROT(A,5) + (((C^D)&B)^D) + E + W[t] + 0x5a827999; \ > - E = D; D = C; C = SHA_ROT(B, 30); B = A; A = TEMP; > + TEMP = SHA_ROL(A,5) + (((C^D)&B)^D) + E + W[t] + 0x5a827999; \ > + E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP; > > T_0_19( 0); T_0_19( 1); T_0_19( 2); T_0_19( 3); T_0_19( 4); > T_0_19( 5); T_0_19( 6); T_0_19( 7); T_0_19( 8); T_0_19( 9); unrolling these otoh is a clear loss (iirc ~10%). artur -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html