Hello. I'm working with cortex-a9 based platform which has 4 ARMv8-cores. I ran the Geekbench and found that SHA1 and SHA2 test have very poor performance. I disassemble the SHA1 and SHA2 test code (sha1.o and sha2.o), found they were using floating-point registers often even though SHA does only integer operation like followings: (compiler is gcc-linaro-aarch64-linux-gnu-4.9-2014.08_linux.tar.bz2 from http://releases.linaro.org/14.08/components/toolchain/binaries) fmov s21, w5 fmov s20, w9 add w9, w6, w4 fmov w4, s0 ror w5, w15, 25 ror w19, w15, 11 fmov s0, w5 eor w19, w19, w4 fmov w5, s20 fmov w4, s21 add w14, w9, w14 fmov w9, s0 I think the fp regs were using to backup register. I'd read an article, http://www.informit.com/articles/article.aspx?p=1620207&seqNum=4, so I guessed the poor performance might be caused by accessing floating-point reg. I added -mgeneral-regs-only option and got better performance, almost 200% better. I'm wondering that the accessing floating-point register really can be slower than accessing cache or ddr memory? If so, why is gcc generating code using floating-pointer register for non-floating-point calculation?