On Thu, Sep 18, 2014 at 07:04:29AM +0100, Gioh Kim wrote: > I disassemble the SHA1 and SHA2 test code (sha1.o and sha2.o), > found they were using floating-point registers often even though SHA does > only integer operation like followings: > (compiler is gcc-linaro-aarch64-linux-gnu-4.9-2014.08_linux.tar.bz2 from http://releases.linaro.org/14.08/components/toolchain/binaries) > > fmov s21, w5 > fmov s20, w9 > add w9, w6, w4 > fmov w4, s0 > ror w5, w15, 25 > ror w19, w15, 11 > fmov s0, w5 > eor w19, w19, w4 > fmov w5, s20 > fmov w4, s21 > add w14, w9, w14 > fmov w9, s0 > > I think the fp regs were using to backup register. > > I'd read an article, > http://www.informit.com/articles/article.aspx?p=1620207&seqNum=4, so I > guessed the poor performance might be caused by accessing floating-point reg. > I added -mgeneral-regs-only option and got better performance, almost 200% > better. > > I'm wondering that the accessing floating-point register really can be slower > than accessing cache or ddr memory? If so, why is gcc generating code using > floating-pointer register for non-floating-point calculation? Hi Gioh Kim, You are correct, GCC should not be generating the code above. This is an issue we have seen and should hopefully have gone some way towards fixing for GCC 5.0. There was a set of patches contributed to GCC recently by Wilco Dijkstra (trunk revisions 215205 -> 215208) which modify the cost models used by the compiler to choose when to spill to the SIMD and floating-point registers. This should make the situation you describe much more rare when using the trunk compiler. It is still possible that you will see code generation like the above, but this should be considered a bug in GCC and should be reported through the usual channels ( https://gcc.gnu.org/bugs/ ). Note that the Linaro toolchain you are using is not based on the trunk compiler. If you are looking for support for the Linaro toolchain, you should contact Linaro through their preferred channels. Thanks, James Greenhalgh