ARM inline asm and temp register without extra move

Jeffrey Walton <noloader@xxxxxxxxx> · Sat, 25 May 2019 23:04:39 -0400

I'm testing some byte swap code from William Hohl's book on ARM
assembly language programming.

According to Hohl (p. 74) the byte swap should be coded as follows:

    eor    r1, r0, r0, ror #16
    bic    r1, r1, #0xff0000
    mov    r0, r0, ror #8
    eor    r0, r0, r1, lsr #8

When I convert to GCC extended ASM:

  unsigned int byteswap(unsigned int value)
  {
    unsigned int temp;
    __asm__ ("eor    %1, %0, %0, ror #16    \n\t"
             "bic    %1, %1, #0xff0000      \n\t"
             "mov    %0, %0, ror #8         \n\t"
             "eor    %0, %0, %1, lsr #8     \n\t"
             : "+r" (value), "+r" (temp));
    return value;
  }

There is a spurious move when a scratch register is used. Instead of 4
instructions there are 5. The mov r3, #0 was introduced with the
scratch register.

    mov r3, #0
    eor r3, r0, r0, ror #16
    bic r3, r3, #0xff0000
    mov r0, r0, ror #8
    eor r0, r0, r3, lsr #8

According to Goldbot (https://godbolt.org/z/jmIya9 ), the spurious
move is present in GCC 6.4 - GCC 8.3.

I don't want to do the temp register that way (claiming it is an input
and an output that I want), but it is the only way to get a scratch
register according to GCC Extended ASM HowTo.

How do I code this up and avoid the extra move?

Jeff