Linus Torvalds wrote: > > Just out of curiosity, does anything change if you change the > > B = SHA_ROR(B,2) > > into a > > B = SHA_ROR(SHA_ROR(B,1),1) > > instead? It's very possible that it becomes _much_ worse, but I guess it's Did try that yesterday, didn't help. Will recheck now.. yep: before: linus 0.3554 171.7 after: linus 0.407 150 still true for the current version. > So optimizing for P4 is often the wrong thing. > > Secondly, P4's are going away. You may have one, but they are getting > rare. So optimizing for them is a losing proposition in the long run. Sure, no argument; it's just that avoiding the P4 pitfalls is usually not that hard and the impact on other, non-netburst, archs is low. There are a lot of P4s out there and they're not going away soon. (i'm still keeping most of my git trees on a P3...) For generic C code such as this the difference for your i7 was -2% and +70% for my P4; all the other (but one, i think) optimizations which worked on P4 also applied to 32-bit i7. As i happen to have a p4 i can just as well test the code on it, many improvements will likely apply to other cpus too. That's all, i doubt anybody seriously considered "optimizing for P4"; there is a reason intel discontinued them :) The atom is a more important target, but only the asm versions did well there so far. artur -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html