On Mon, 17 Aug 2009, Steven Noonan wrote: > > Interesting. I compared Linus' implementation to the public domain one > by Steve Reid[1] You _really_ need to talk about what kind of environment you have. There are three major issues: - Netburst vs non-netburst - 32-bit vs 64-bit - compiler version Steve Reid's code looks great, but the way it is coded, gcc makes a mess of it, which is exactly what my SHA1 tries to avoid. [ In contrast, gcc does very well on just about _any_ straightforward unrolled SHA1 C code if the target architecture is something like PPC or ia64 that has enough registers to keep it all in registers. I haven't really tested other compilers - a less aggressive compiler would actually do _better_ on SHA1, because the problem with gcc is that it turns the whole temporary 16-entry word array into register accesses, and tries to do register allocation on that _array_. That is wonderful for the above-mentioned PPC and IA64, but it makes gcc create totally crazy code when there aren't enough registers, and then gcc starts spilling randomly (ie it starts spilling a-e etc). This is why the compiler and version matters so much. ] > (average of 5 runs) > Linus' sha1: 283MB/s > Steve Reid's sha1: 305MB/s So I get very different results: # TIME[s] SPEED[MB/s] Reid 2.742 222.6 linus 1.464 417 this is Intel Nehalem, but compiled for 32-bit mode (which is the more challenging one because x86-32 only has 7 general-purpose registers), and with gcc-4.4.0. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html