On Tue, 11 Aug 2009, Nicolas Pitre wrote: > > Well... gcc is really strange in this case (and similar other ones) with > ARM compilation. A good indicator of the quality of the code is the > size of the stack frame. When using the "+m" then gcc creates a 816 > byte stack frame, the generated binary grows by approx 3000 bytes, and > performances is almost halved (7.600s). Looking at the assembly result > I just can't figure out all the crazy moves taking place. Even the > version with no barrier what so ever produces better assembly with a > stack frame of 560 bytes. Ok, that's just crazy. That function has a required stack size of exactly 64 bytes, and anything more than that is just spilling. And if you end up with a stack frame of 560 bytes, that means that gcc is doing some _crazy_ spilling. One thing that strikes me is that I've been just testing with gcc-4.4, and BenH (who did some tests on PPC where SHA1 is just _trivial_ because it all fits in the normal register space) noticed that older versions of gcc that he tested did much worse on this. I think Artur also posted (x86) numbers with older gcc versions doing worse. Maybe you're seeing some of that? Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html