Re: [PATCH 0/7] block-sha1: improved SHA1 hashing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds wrote:
> 
> On Thu, 6 Aug 2009, Artur Skawina wrote:
>> Oh, i noticed that '-mtune' makes quite a difference, it can change
>> the relative performance of the functions significantly, in unobvious
>> ways; depending on which cpu gcc tunes for (build config or -mtune);
>> some implementations slow down, others become a bit faster.
> 
> That probably is mainly true for P4, although it's quite possible that it 
> has an effect for just what the register allocator does, and then for 
> spilling.
> 
> And it looks like _all_ the tweakability is in the spilling. Nothing else 
> matters.
> 
> How does this patch work for you? It avoids doing that C-level register 
> rotation, and instead rotates the register names with the preprocessor.
> 
> I realize it's ugly as hell, but it does make it easier for gcc to see 
> what's going on.
> 
> The patch is against my git patches, but I think it should apply pretty 
> much as-is to your sha1bench sources too. Does it make any difference for 
> you?

it's a bit slower (P4):

before: linus          0.6288       97.06
after:  linus          0.6604       92.42

i was trying similar things, like the example below, too, but it wasn't a
win on 32 bit...

artur

[the iteration below is functionally correct, but scheduling is most likely
 fubared as it wasn't a win and i was checking how much a difference it made
 on P4 -- ~-20..~0%, but never faster (relative to linusas2; it _is_ faster
 than 'linus'. Dropped this version when merging your new preprocessor macros.]

@@ -125,6 +127,8 @@
 #define W(x) (array[(x)&15])
 #define SHA_XOR(t) \
        TEMP = SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1); W(t) = TEMP;
+#define SHA_XOR2(t) \
+       SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1)
 
 #define T_16_19(t) \
         { unsigned TEMP;\
@@ -139,10 +143,27 @@
 #endif
 
 #define T_20_39(t) \
-        { unsigned TEMP;\
-       SHA_XOR(t); \
-       TEMP += (B^C^D) + E + 0x6ed9eba1; \
-       E = D; D = C; C = SHA_ROR(B, 2); B = A; TEMP += SHA_ROL(A,5); A = TEMP; }
+        if (t%2==0) {\
+               unsigned TEMP;\
+               unsigned TEMP2;\
+               \
+               TEMP   = SHA_XOR2(t); \
+               TEMP2  = SHA_XOR2(t+1); \
+               W(t)   = TEMP;\
+               W(t+1) = TEMP2;\
+               TEMP   += E + 0x6ed9eba1; \
+               E      = C;\
+               TEMP   += (B^E^D); \
+               TEMP2  += D + 0x6ed9eba1; \
+               D      = SHA_ROR(B, 2);\
+               B      = SHA_ROL(A, 5);\
+               B      += TEMP;\
+               C      = SHA_ROR(A, 2);\
+               A      ^= E; \
+               A      ^= D; \
+               A      += TEMP2;\
+               A      += SHA_ROL(B, 5);\
+       }
 
 #if UNROLL
        T_20_39(20); T_20_39(21); T_20_39(22); T_20_39(23); T_20_39(24);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]