Linus Torvalds wrote: > > On Thu, 6 Aug 2009, Artur Skawina wrote: >> Oh, i noticed that '-mtune' makes quite a difference, it can change >> the relative performance of the functions significantly, in unobvious >> ways; depending on which cpu gcc tunes for (build config or -mtune); >> some implementations slow down, others become a bit faster. > > That probably is mainly true for P4, although it's quite possible that it > has an effect for just what the register allocator does, and then for > spilling. > > And it looks like _all_ the tweakability is in the spilling. Nothing else > matters. > > How does this patch work for you? It avoids doing that C-level register > rotation, and instead rotates the register names with the preprocessor. > > I realize it's ugly as hell, but it does make it easier for gcc to see > what's going on. > > The patch is against my git patches, but I think it should apply pretty > much as-is to your sha1bench sources too. Does it make any difference for > you? it's a bit slower (P4): before: linus 0.6288 97.06 after: linus 0.6604 92.42 i was trying similar things, like the example below, too, but it wasn't a win on 32 bit... artur [the iteration below is functionally correct, but scheduling is most likely fubared as it wasn't a win and i was checking how much a difference it made on P4 -- ~-20..~0%, but never faster (relative to linusas2; it _is_ faster than 'linus'. Dropped this version when merging your new preprocessor macros.] @@ -125,6 +127,8 @@ #define W(x) (array[(x)&15]) #define SHA_XOR(t) \ TEMP = SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1); W(t) = TEMP; +#define SHA_XOR2(t) \ + SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1) #define T_16_19(t) \ { unsigned TEMP;\ @@ -139,10 +143,27 @@ #endif #define T_20_39(t) \ - { unsigned TEMP;\ - SHA_XOR(t); \ - TEMP += (B^C^D) + E + 0x6ed9eba1; \ - E = D; D = C; C = SHA_ROR(B, 2); B = A; TEMP += SHA_ROL(A,5); A = TEMP; } + if (t%2==0) {\ + unsigned TEMP;\ + unsigned TEMP2;\ + \ + TEMP = SHA_XOR2(t); \ + TEMP2 = SHA_XOR2(t+1); \ + W(t) = TEMP;\ + W(t+1) = TEMP2;\ + TEMP += E + 0x6ed9eba1; \ + E = C;\ + TEMP += (B^E^D); \ + TEMP2 += D + 0x6ed9eba1; \ + D = SHA_ROR(B, 2);\ + B = SHA_ROL(A, 5);\ + B += TEMP;\ + C = SHA_ROR(A, 2);\ + A ^= E; \ + A ^= D; \ + A += TEMP2;\ + A += SHA_ROL(B, 5);\ + } #if UNROLL T_20_39(20); T_20_39(21); T_20_39(22); T_20_39(23); T_20_39(24); -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html