Re: Is accessing floating-point register slower than accessing cache-memory?

James Greenhalgh <james.greenhalgh@xxxxxxx> · Thu, 18 Sep 2014 15:17:15 +0100

On Thu, Sep 18, 2014 at 07:04:29AM +0100, Gioh Kim wrote:
> I disassemble the SHA1 and SHA2 test code (sha1.o and sha2.o),
> found they were using floating-point registers often even though SHA does
> only integer operation like followings:
> (compiler is gcc-linaro-aarch64-linux-gnu-4.9-2014.08_linux.tar.bz2 from http://releases.linaro.org/14.08/components/toolchain/binaries)
> 
>    fmov    s21, w5
>    fmov    s20, w9
>    add     w9, w6, w4
>    fmov    w4, s0
>    ror     w5, w15, 25
>    ror     w19, w15, 11
>    fmov    s0, w5
>    eor     w19, w19, w4
>    fmov    w5, s20
>    fmov    w4, s21
>    add     w14, w9, w14
>    fmov    w9, s0
> 
> I think the fp regs were using to backup register.
> 
> I'd read an article,
> http://www.informit.com/articles/article.aspx?p=1620207&seqNum=4, so I
> guessed the poor performance might be caused by accessing floating-point reg.
> I added -mgeneral-regs-only option and got better performance, almost 200%
> better.
> 
> I'm wondering that the accessing floating-point register really can be slower
> than accessing cache or ddr memory?  If so, why is gcc generating code using
> floating-pointer register for non-floating-point calculation?

Hi Gioh Kim,

You are correct, GCC should not be generating the code above. This is an
issue we have seen and should hopefully have gone some way towards fixing
for GCC 5.0.

There was a set of patches contributed to GCC recently by Wilco
Dijkstra (trunk revisions 215205 -> 215208) which modify the cost models
used by the compiler to choose when to spill to the SIMD and floating-point
registers. This should make the situation you describe much more rare when
using the trunk compiler.

It is still possible that you will see code generation like the above, but
this should be considered a bug in GCC and should be reported through the
usual channels ( https://gcc.gnu.org/bugs/ ).

Note that the Linaro toolchain you are using is not based on the trunk
compiler. If you are looking for support for the Linaro toolchain, you
should contact Linaro through their preferred channels.

Thanks,
James Greenhalgh