Re: Register encoding in assembly for load/store instructions

Yonghong Song <yonghong.song@xxxxxxxxx> · Tue, 25 Jul 2023 11:47:35 -0700

On 7/25/23 10:29 AM, Jose E. Marchesi wrote:

Hello Yonghong.

We have noticed that the llvm disassembler uses different notations for
registers in load and store instructions, depending somehow on the width
of the data being loaded or stored.

For example, this is an excerpt from the assembler-disassembler.s test
file in llvm:

   // Note: For the group below w1 is used as a destination for sizes u8, u16, u32.
   //       This is disassembler quirk, but is technically not wrong, as there are
   //       no different encodings for 'r1 = load' vs 'w1 = load'.
   //
   // CHECK: 71 21 2a 00 00 00 00 00	w1 = *(u8 *)(r2 + 0x2a)
   // CHECK: 69 21 2a 00 00 00 00 00	w1 = *(u16 *)(r2 + 0x2a)
   // CHECK: 61 21 2a 00 00 00 00 00	w1 = *(u32 *)(r2 + 0x2a)
   // CHECK: 79 21 2a 00 00 00 00 00	r1 = *(u64 *)(r2 + 0x2a)
   r1 = *(u8*)(r2 + 42)
   r1 = *(u16*)(r2 + 42)
   r1 = *(u32*)(r2 + 42)
   r1 = *(u64*)(r2 + 42)

The comment there clarifies that the usage of wN instead of rN in the
u8, u16 and u32 cases is a "disassembler quirk".

Anyway, the problem is that it seems that `clang -S' actually emits
these forms with wN.

Is that intended?

Yes, this is intended since alu32 mode is enabled where
w* registers are used for 8/16/32 bit load.

Note that for newer sign-extended loads, even at alu32 mode,
only r* register is used since the sign-extension extends
upto 64 bits for all variants (8/16/32).