Re: ARM inline asm and temp register without extra move

Georg-Johann Lay <avr@xxxxxxxx> · Mon, 27 May 2019 12:25:03 +0200

Jeffrey Walton schrieb:
I'm testing some byte swap code from William Hohl's book on ARM
assembly language programming.

According to Hohl (p. 74) the byte swap should be coded as follows:

    eor    r1, r0, r0, ror #16
    bic    r1, r1, #0xff0000
    mov    r0, r0, ror #8
    eor    r0, r0, r1, lsr #8

When I convert to GCC extended ASM:

  unsigned int byteswap(unsigned int value)
  {
    unsigned int temp;
    __asm__ ("eor    %1, %0, %0, ror #16    \n\t"
             "bic    %1, %1, #0xff0000      \n\t"
             "mov    %0, %0, ror #8         \n\t"
             "eor    %0, %0, %1, lsr #8     \n\t"
             : "+r" (value), "+r" (temp));
    return value;
  }

There is a spurious move when a scratch register is used. Instead of 4
instructions there are 5. The mov r3, #0 was introduced with the
scratch register.

    mov r3, #0
    eor r3, r0, r0, ror #16
    bic r3, r3, #0xff0000
    mov r0, r0, ror #8
    eor r0, r0, r3, lsr #8

According to Goldbot (https://godbolt.org/z/jmIya9 ), the spurious
move is present in GCC 6.4 - GCC 8.3.

I don't want to do the temp register that way (claiming it is an input
and an output that I want), but it is the only way to get a scratch
register according to GCC Extended ASM HowTo.

Huh? temp is not used as an input, at least it is not initialized
with a value.  Some versions of GCC just cook up a value in such
a situation.  Just try "=r" (temp) which states that temp is just
an output value instead of output and input.

Early-clobber (&) is not needed here because value is output,
hence cannot overlap with temp.

And if this code is supposed to be fast, you usually implement
the hosting function as static inline.  Moreover, GCC provides
__builtin_bswap32 (value) which accomplished what you are after.

Johann

How do I code this up and avoid the extra move?

Jeff