Re: Register encoding in assembly for load/store instructions

"Jose E. Marchesi" <jose.marchesi@xxxxxxxxxx> · Tue, 25 Jul 2023 21:11:55 +0200

>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
>>> Hello Yonghong.
>>> We have noticed that the llvm disassembler uses different notations
>>> for
>>> registers in load and store instructions, depending somehow on the width
>>> of the data being loaded or stored.
>>> For example, this is an excerpt from the assembler-disassembler.s
>>> test
>>> file in llvm:
>>>    // Note: For the group below w1 is used as a destination for
>>> sizes u8, u16, u32.
>>>    // This is disassembler quirk, but is technically not wrong, as
>>> there are
>>>    //       no different encodings for 'r1 = load' vs 'w1 = load'.
>>>    //
>>>    // CHECK: 71 21 2a 00 00 00 00 00	w1 = *(u8 *)(r2 + 0x2a)
>>>    // CHECK: 69 21 2a 00 00 00 00 00	w1 = *(u16 *)(r2 + 0x2a)
>>>    // CHECK: 61 21 2a 00 00 00 00 00	w1 = *(u32 *)(r2 + 0x2a)
>>>    // CHECK: 79 21 2a 00 00 00 00 00	r1 = *(u64 *)(r2 + 0x2a)
>>>    r1 = *(u8*)(r2 + 42)
>>>    r1 = *(u16*)(r2 + 42)
>>>    r1 = *(u32*)(r2 + 42)
>>>    r1 = *(u64*)(r2 + 42)
>>> The comment there clarifies that the usage of wN instead of rN in
>>> the
>>> u8, u16 and u32 cases is a "disassembler quirk".
>>> Anyway, the problem is that it seems that `clang -S' actually emits
>>> these forms with wN.
>>> Is that intended?
>>
>> Yes, this is intended since alu32 mode is enabled where
>> w* registers are used for 8/16/32 bit load.
>
> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'?  The mode is still
> alu32 mode.  Isn't the u{8,16,32} part enough to discriminate?

Sorry my keyboard num-lock activated mid-sentence.

I meant 'r1 = (u8*)(r2 + 42)'.
Why supporting that syntax as well as 'w1 = (u8*)(r2 + 42)'?

>
>> Note that for newer sign-extended loads, even at alu32 mode,
>> only r* register is used since the sign-extension extends
>> upto 64 bits for all variants (8/16/32).
>
> Yes we noticed that :)
>
>>
>>
>>
>>>