Jeffrey Walton schrieb:
I'm testing some byte swap code from William Hohl's book on ARM assembly language programming. According to Hohl (p. 74) the byte swap should be coded as follows: eor r1, r0, r0, ror #16 bic r1, r1, #0xff0000 mov r0, r0, ror #8 eor r0, r0, r1, lsr #8 When I convert to GCC extended ASM: unsigned int byteswap(unsigned int value) { unsigned int temp; __asm__ ("eor %1, %0, %0, ror #16 \n\t" "bic %1, %1, #0xff0000 \n\t" "mov %0, %0, ror #8 \n\t" "eor %0, %0, %1, lsr #8 \n\t" : "+r" (value), "+r" (temp)); return value; } There is a spurious move when a scratch register is used. Instead of 4 instructions there are 5. The mov r3, #0 was introduced with the scratch register. mov r3, #0 eor r3, r0, r0, ror #16 bic r3, r3, #0xff0000 mov r0, r0, ror #8 eor r0, r0, r3, lsr #8 According to Goldbot (https://godbolt.org/z/jmIya9 ), the spurious move is present in GCC 6.4 - GCC 8.3. I don't want to do the temp register that way (claiming it is an input and an output that I want), but it is the only way to get a scratch register according to GCC Extended ASM HowTo.
Huh? temp is not used as an input, at least it is not initialized with a value. Some versions of GCC just cook up a value in such a situation. Just try "=r" (temp) which states that temp is just an output value instead of output and input. Early-clobber (&) is not needed here because value is output, hence cannot overlap with temp. And if this code is supposed to be fast, you usually implement the hosting function as static inline. Moreover, GCC provides __builtin_bswap32 (value) which accomplished what you are after. Johann
How do I code this up and avoid the extra move? Jeff