Re: Register encoding in assembly for load/store instructions

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 25 Jul 2023 17:31:15 -0700

On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi
<jose.marchesi@xxxxxxxxxx> wrote:
>
>
> > On 7/25/23 1:09 PM, Jose E. Marchesi wrote:
> >>
> >>> On 7/25/23 11:56 AM, Jose E. Marchesi wrote:
> >>>>
> >>>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
> >>>>>> Hello Yonghong.
> >>>>>> We have noticed that the llvm disassembler uses different notations
> >>>>>> for
> >>>>>> registers in load and store instructions, depending somehow on the width
> >>>>>> of the data being loaded or stored.
> >>>>>> For example, this is an excerpt from the assembler-disassembler.s
> >>>>>> test
> >>>>>> file in llvm:
> >>>>>>      // Note: For the group below w1 is used as a destination for
> >>>>>> sizes u8, u16, u32.
> >>>>>>      //       This is disassembler quirk, but is technically not wrong, as there are
> >>>>>>      //       no different encodings for 'r1 = load' vs 'w1 = load'.
> >>>>>>      //
> >>>>>>      // CHECK: 71 21 2a 00 00 00 00 00   w1 = *(u8 *)(r2 + 0x2a)
> >>>>>>      // CHECK: 69 21 2a 00 00 00 00 00   w1 = *(u16 *)(r2 + 0x2a)
> >>>>>>      // CHECK: 61 21 2a 00 00 00 00 00   w1 = *(u32 *)(r2 + 0x2a)
> >>>>>>      // CHECK: 79 21 2a 00 00 00 00 00   r1 = *(u64 *)(r2 + 0x2a)
> >>>>>>      r1 = *(u8*)(r2 + 42)
> >>>>>>      r1 = *(u16*)(r2 + 42)
> >>>>>>      r1 = *(u32*)(r2 + 42)
> >>>>>>      r1 = *(u64*)(r2 + 42)
> >>>>>> The comment there clarifies that the usage of wN instead of rN in
> >>>>>> the
> >>>>>> u8, u16 and u32 cases is a "disassembler quirk".
> >>>>>> Anyway, the problem is that it seems that `clang -S' actually emits
> >>>>>> these forms with wN.
> >>>>>> Is that intended?
> >>>>>
> >>>>> Yes, this is intended since alu32 mode is enabled where
> >>>>> w* registers are used for 8/16/32 bit load.
> >>>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'?  The mode is
> >>>> still
> >>>> alu32 mode.  Isn't the u{8,16,32} part enough to discriminate?
> >>>
> >>> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?
> >>>
> >>> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
> >>> then w* register is used. If no alu32 mode for objdump, then r* register
> >>> is used. Basically the same insn, disasm is different depending on
> >>> alu32 mode or not. u8/u16/u32 is not enough to differentiate.
> >> Ok, so the llvm objdump has a switch that tells when to use rN or wN
> >> when printing these particular instructions.  Thats the "disassembler
> >> quirk".  To what purpose?  Isnt the person passing the command line
> >> switch the same person reading the disassembled program?  Is this "alu32
> >> mode" more than a cosmetic thing?
> >> But what concern us is the assembler, not the disassembler.
> >> clang -S (which is not objdump) seems to generate these instructions
> >> with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for
> >> example) and we assume the output of clang -S is intended to be passed
> >> to an assembler, much like with gcc -S.
> >> So, should we support both syntaxes as _input_ syntax in the
> >> assembler?
> >
> > Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing
> > list), and -mcpu=v3 has alu32 enabled by default. So I think
> > gcc can start to emit insn assuming alu32 mode is on by default.
> > So
> >    w1 = *(u8 *)(r2 + 42)
> > is preferred.
>
> We have V4 by default now.  So we can emit
>
>   w1 = *(u8 *)(r2 + 42)
>
> when -mcpu is v3 or higher, or if -malu32 is specified, and
>
>   r1 = *(u8 *)(r2 + 42)
>
> when -mcpu is v2 or lower, or if -mnoalu32 is specified.
>
> Sounds good?
>
> However this implies that the assembler should indeed recognize both
> forms of instructions.  But note that it will assembly them to the
> exactly same encoded instruction.  This includes inline asm (remember
> GCC does not have an integrated assembler.)

Good point.
I think we made a mistake in clang.
We shouldn't be printing
w1 = *(u8 *)(r2 + 42)
since such instruction doesn't exist in BPF ISA
and it's confusing.
There is only one instruction:
r1 = *(u8 *)(r2 + 42)
which is an 8-bit load that zero extends into 64-bit.
x86 JIT actually implements it as 8-bit load that stores
into a 32-bit subregister, so it kinda matches w1,
but that's an implementation detail of the JIT.

I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42)
regardless of alu32 or not.
In gas and clang assembler we can support both w1= and r1=
flavors for backward compat.