I'm testing some byte swap code from William Hohl's book on ARM assembly language programming. According to Hohl (p. 74) the byte swap should be coded as follows: eor r1, r0, r0, ror #16 bic r1, r1, #0xff0000 mov r0, r0, ror #8 eor r0, r0, r1, lsr #8 When I convert to GCC extended ASM: unsigned int byteswap(unsigned int value) { unsigned int temp; __asm__ ("eor %1, %0, %0, ror #16 \n\t" "bic %1, %1, #0xff0000 \n\t" "mov %0, %0, ror #8 \n\t" "eor %0, %0, %1, lsr #8 \n\t" : "+r" (value), "+r" (temp)); return value; } There is a spurious move when a scratch register is used. Instead of 4 instructions there are 5. The mov r3, #0 was introduced with the scratch register. mov r3, #0 eor r3, r0, r0, ror #16 bic r3, r3, #0xff0000 mov r0, r0, ror #8 eor r0, r0, r3, lsr #8 According to Goldbot (https://godbolt.org/z/jmIya9 ), the spurious move is present in GCC 6.4 - GCC 8.3. I don't want to do the temp register that way (claiming it is an input and an output that I want), but it is the only way to get a scratch register according to GCC Extended ASM HowTo. How do I code this up and avoid the extra move? Jeff