Re: Register encoding in assembly for load/store instructions

"Jose E. Marchesi" <jose.marchesi@xxxxxxxxxx> · Tue, 25 Jul 2023 22:09:47 +0200

> On 7/25/23 11:56 AM, Jose E. Marchesi wrote:
>> 
>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
>>>> Hello Yonghong.
>>>> We have noticed that the llvm disassembler uses different notations
>>>> for
>>>> registers in load and store instructions, depending somehow on the width
>>>> of the data being loaded or stored.
>>>> For example, this is an excerpt from the assembler-disassembler.s
>>>> test
>>>> file in llvm:
>>>>     // Note: For the group below w1 is used as a destination for
>>>> sizes u8, u16, u32.
>>>>     //       This is disassembler quirk, but is technically not wrong, as there are
>>>>     //       no different encodings for 'r1 = load' vs 'w1 = load'.
>>>>     //
>>>>     // CHECK: 71 21 2a 00 00 00 00 00	w1 = *(u8 *)(r2 + 0x2a)
>>>>     // CHECK: 69 21 2a 00 00 00 00 00	w1 = *(u16 *)(r2 + 0x2a)
>>>>     // CHECK: 61 21 2a 00 00 00 00 00	w1 = *(u32 *)(r2 + 0x2a)
>>>>     // CHECK: 79 21 2a 00 00 00 00 00	r1 = *(u64 *)(r2 + 0x2a)
>>>>     r1 = *(u8*)(r2 + 42)
>>>>     r1 = *(u16*)(r2 + 42)
>>>>     r1 = *(u32*)(r2 + 42)
>>>>     r1 = *(u64*)(r2 + 42)
>>>> The comment there clarifies that the usage of wN instead of rN in
>>>> the
>>>> u8, u16 and u32 cases is a "disassembler quirk".
>>>> Anyway, the problem is that it seems that `clang -S' actually emits
>>>> these forms with wN.
>>>> Is that intended?
>>>
>>> Yes, this is intended since alu32 mode is enabled where
>>> w* registers are used for 8/16/32 bit load.
>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'?  The mode is
>> still
>> alu32 mode.  Isn't the u{8,16,32} part enough to discriminate?
>
> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?
>
> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
> then w* register is used. If no alu32 mode for objdump, then r* register
> is used. Basically the same insn, disasm is different depending on
> alu32 mode or not. u8/u16/u32 is not enough to differentiate.

Ok, so the llvm objdump has a switch that tells when to use rN or wN
when printing these particular instructions.  Thats the "disassembler
quirk".  To what purpose?  Isnt the person passing the command line
switch the same person reading the disassembled program?  Is this "alu32
mode" more than a cosmetic thing?

But what concern us is the assembler, not the disassembler.

clang -S (which is not objdump) seems to generate these instructions
with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for
example) and we assume the output of clang -S is intended to be passed
to an assembler, much like with gcc -S.

So, should we support both syntaxes as _input_ syntax in the assembler?

>> 
>>> Note that for newer sign-extended loads, even at alu32 mode,
>>> only r* register is used since the sign-extension extends
>>> upto 64 bits for all variants (8/16/32).
>> Yes we noticed that :)
>> 
>>>
>>>
>>>
>>>>