Re: Register encoding in assembly for load/store instructions

Yonghong Song <yonghong.song@xxxxxxxxx> · Fri, 28 Jul 2023 16:25:45 -0700

On 7/28/23 9:58 AM, Eduard Zingerman wrote:
On Tue, 2023-07-25 at 21:16 -0700, Yonghong Song wrote:

On 7/25/23 5:39 PM, Eduard Zingerman wrote:
On Tue, 2023-07-25 at 17:31 -0700, Alexei Starovoitov wrote:
On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi
<jose.marchesi@xxxxxxxxxx> wrote:

On 7/25/23 1:09 PM, Jose E. Marchesi wrote:

On 7/25/23 11:56 AM, Jose E. Marchesi wrote:

On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
Hello Yonghong.
We have noticed that the llvm disassembler uses different notations
for
registers in load and store instructions, depending somehow on the width
of the data being loaded or stored.
For example, this is an excerpt from the assembler-disassembler.s
test
file in llvm:
       // Note: For the group below w1 is used as a destination for
sizes u8, u16, u32.
       //       This is disassembler quirk, but is technically not wrong, as there are
       //       no different encodings for 'r1 = load' vs 'w1 = load'.
       //
       // CHECK: 71 21 2a 00 00 00 00 00   w1 = *(u8 *)(r2 + 0x2a)
       // CHECK: 69 21 2a 00 00 00 00 00   w1 = *(u16 *)(r2 + 0x2a)
       // CHECK: 61 21 2a 00 00 00 00 00   w1 = *(u32 *)(r2 + 0x2a)
       // CHECK: 79 21 2a 00 00 00 00 00   r1 = *(u64 *)(r2 + 0x2a)
       r1 = *(u8*)(r2 + 42)
       r1 = *(u16*)(r2 + 42)
       r1 = *(u32*)(r2 + 42)
       r1 = *(u64*)(r2 + 42)
The comment there clarifies that the usage of wN instead of rN in
the
u8, u16 and u32 cases is a "disassembler quirk".
Anyway, the problem is that it seems that `clang -S' actually emits
these forms with wN.
Is that intended?

Yes, this is intended since alu32 mode is enabled where
w* registers are used for 8/16/32 bit load.
So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'?  The mode is
still
alu32 mode.  Isn't the u{8,16,32} part enough to discriminate?

What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?

For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
then w* register is used. If no alu32 mode for objdump, then r* register
is used. Basically the same insn, disasm is different depending on
alu32 mode or not. u8/u16/u32 is not enough to differentiate.
Ok, so the llvm objdump has a switch that tells when to use rN or wN
when printing these particular instructions.  Thats the "disassembler
quirk".  To what purpose?  Isnt the person passing the command line
switch the same person reading the disassembled program?  Is this "alu32
mode" more than a cosmetic thing?
But what concern us is the assembler, not the disassembler.
clang -S (which is not objdump) seems to generate these instructions
with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for
example) and we assume the output of clang -S is intended to be passed
to an assembler, much like with gcc -S.
So, should we support both syntaxes as _input_ syntax in the
assembler?

Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing
list), and -mcpu=v3 has alu32 enabled by default. So I think
gcc can start to emit insn assuming alu32 mode is on by default.
So
     w1 = *(u8 *)(r2 + 42)
is preferred.

We have V4 by default now.  So we can emit

    w1 = *(u8 *)(r2 + 42)

when -mcpu is v3 or higher, or if -malu32 is specified, and

    r1 = *(u8 *)(r2 + 42)

when -mcpu is v2 or lower, or if -mnoalu32 is specified.

Sounds good?

However this implies that the assembler should indeed recognize both
forms of instructions.  But note that it will assembly them to the
exactly same encoded instruction.  This includes inline asm (remember
GCC does not have an integrated assembler.)

Good point.
I think we made a mistake in clang.
We shouldn't be printing
w1 = *(u8 *)(r2 + 42)
since such instruction doesn't exist in BPF ISA
and it's confusing.
There is only one instruction:
r1 = *(u8 *)(r2 + 42)
which is an 8-bit load that zero extends into 64-bit.
x86 JIT actually implements it as 8-bit load that stores
into a 32-bit subregister, so it kinda matches w1,
but that's an implementation detail of the JIT.

I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42)
regardless of alu32 or not.
In gas and clang assembler we can support both w1= and r1=
flavors for backward compat.

I agree with Alexei (the ... disassembler quirk ... comment is left by me :).
Can dig into clang part of things if this is a consensus.

For disassembler, we have stx as well may use w* registers with alu32.
In llvm BPFDisassembler.cpp, we have

    if ((InstClass == BPF_LDX || InstClass == BPF_STX) &&
        getInstSize(Insn) != BPF_DW &&
        (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) &&
        STI.hasFeature(BPF::ALU32))
      Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn,
Address,
                                 this, STI);
    else
      Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address,
this,
                                 STI);

Maybe we should just do

    Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address,
this, STI);

So we already disassemble based on non-alu32 mode?

Yonghong, Alexei,

I have a prototype [1] that consolidates STW/STW32, LDW/LDW32 etc
instructions in LLVM BPF backend, thus removing the syntactic
difference. I think it simplifies BPFInstrInfo.td a bit but that's up
to debate.

Should I proceed with it?

[1] https://reviews.llvm.org/D156559

I made a comment to the diff w.r.t. backward compatibility issue.