Re: block-sha1: improve code on large-register-set machines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Tue, 11 Aug 2009, Nicolas Pitre wrote:
> 
> Well... gcc is really strange in this case (and similar other ones) with 
> ARM compilation.  A good indicator of the quality of the code is the 
> size of the stack frame.  When using the "+m" then gcc creates a 816 
> byte stack frame, the generated binary grows by approx 3000 bytes, and 
> performances is almost halved (7.600s).  Looking at the assembly result 
> I just can't figure out all the crazy moves taking place.  Even the 
> version with no barrier what so ever produces better assembly with a 
> stack frame of 560 bytes.

Ok, that's just crazy. That function has a required stack size of exactly 
64 bytes, and anything more than that is just spilling. And if you end up 
with a stack frame of 560 bytes, that means that gcc is doing some _crazy_ 
spilling.

One thing that strikes me is that I've been just testing with gcc-4.4, and 
BenH (who did some tests on PPC where SHA1 is just _trivial_ because it 
all fits in the normal register space) noticed that older versions of gcc 
that he tested did much worse on this.

I think Artur also posted (x86) numbers with older gcc versions doing 
worse. Maybe you're seeing some of that?

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]