On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
Hello Yonghong. We have noticed that the llvm disassembler uses different notations for registers in load and store instructions, depending somehow on the width of the data being loaded or stored. For example, this is an excerpt from the assembler-disassembler.s test file in llvm: // Note: For the group below w1 is used as a destination for sizes u8, u16, u32. // This is disassembler quirk, but is technically not wrong, as there are // no different encodings for 'r1 = load' vs 'w1 = load'. // // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) r1 = *(u8*)(r2 + 42) r1 = *(u16*)(r2 + 42) r1 = *(u32*)(r2 + 42) r1 = *(u64*)(r2 + 42) The comment there clarifies that the usage of wN instead of rN in the u8, u16 and u32 cases is a "disassembler quirk". Anyway, the problem is that it seems that `clang -S' actually emits these forms with wN. Is that intended?
Yes, this is intended since alu32 mode is enabled where w* registers are used for 8/16/32 bit load. Note that for newer sign-extended loads, even at alu32 mode, only r* register is used since the sign-extension extends upto 64 bits for all variants (8/16/32).