Re: Register encoding in assembly for load/store instructions

Eduard Zingerman <eddyz87@xxxxxxxxx> · Wed, 26 Jul 2023 03:39:48 +0300

On Tue, 2023-07-25 at 17:31 -0700, Alexei Starovoitov wrote:
> On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi
> <jose.marchesi@xxxxxxxxxx> wrote:
> > 
> > 
> > > On 7/25/23 1:09 PM, Jose E. Marchesi wrote:
> > > > 
> > > > > On 7/25/23 11:56 AM, Jose E. Marchesi wrote:
> > > > > > 
> > > > > > > On 7/25/23 10:29 AM, Jose E. Marchesi wrote:
> > > > > > > > Hello Yonghong.
> > > > > > > > We have noticed that the llvm disassembler uses different notations
> > > > > > > > for
> > > > > > > > registers in load and store instructions, depending somehow on the width
> > > > > > > > of the data being loaded or stored.
> > > > > > > > For example, this is an excerpt from the assembler-disassembler.s
> > > > > > > > test
> > > > > > > > file in llvm:
> > > > > > > >      // Note: For the group below w1 is used as a destination for
> > > > > > > > sizes u8, u16, u32.
> > > > > > > >      //       This is disassembler quirk, but is technically not wrong, as there are
> > > > > > > >      //       no different encodings for 'r1 = load' vs 'w1 = load'.
> > > > > > > >      //
> > > > > > > >      // CHECK: 71 21 2a 00 00 00 00 00   w1 = *(u8 *)(r2 + 0x2a)
> > > > > > > >      // CHECK: 69 21 2a 00 00 00 00 00   w1 = *(u16 *)(r2 + 0x2a)
> > > > > > > >      // CHECK: 61 21 2a 00 00 00 00 00   w1 = *(u32 *)(r2 + 0x2a)
> > > > > > > >      // CHECK: 79 21 2a 00 00 00 00 00   r1 = *(u64 *)(r2 + 0x2a)
> > > > > > > >      r1 = *(u8*)(r2 + 42)
> > > > > > > >      r1 = *(u16*)(r2 + 42)
> > > > > > > >      r1 = *(u32*)(r2 + 42)
> > > > > > > >      r1 = *(u64*)(r2 + 42)
> > > > > > > > The comment there clarifies that the usage of wN instead of rN in
> > > > > > > > the
> > > > > > > > u8, u16 and u32 cases is a "disassembler quirk".
> > > > > > > > Anyway, the problem is that it seems that `clang -S' actually emits
> > > > > > > > these forms with wN.
> > > > > > > > Is that intended?
> > > > > > > 
> > > > > > > Yes, this is intended since alu32 mode is enabled where
> > > > > > > w* registers are used for 8/16/32 bit load.
> > > > > > So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'?  The mode is
> > > > > > still
> > > > > > alu32 mode.  Isn't the u{8,16,32} part enough to discriminate?
> > > > > 
> > > > > What does this 'r1 = 8948 8*9r2 + 0x2a)' mean?
> > > > > 
> > > > > For u8/u16/u32 loads, if objdump with option to indicate alu32 mode,
> > > > > then w* register is used. If no alu32 mode for objdump, then r* register
> > > > > is used. Basically the same insn, disasm is different depending on
> > > > > alu32 mode or not. u8/u16/u32 is not enough to differentiate.
> > > > Ok, so the llvm objdump has a switch that tells when to use rN or wN
> > > > when printing these particular instructions.  Thats the "disassembler
> > > > quirk".  To what purpose?  Isnt the person passing the command line
> > > > switch the same person reading the disassembled program?  Is this "alu32
> > > > mode" more than a cosmetic thing?
> > > > But what concern us is the assembler, not the disassembler.
> > > > clang -S (which is not objdump) seems to generate these instructions
> > > > with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for
> > > > example) and we assume the output of clang -S is intended to be passed
> > > > to an assembler, much like with gcc -S.
> > > > So, should we support both syntaxes as _input_ syntax in the
> > > > assembler?
> > > 
> > > Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing
> > > list), and -mcpu=v3 has alu32 enabled by default. So I think
> > > gcc can start to emit insn assuming alu32 mode is on by default.
> > > So
> > >    w1 = *(u8 *)(r2 + 42)
> > > is preferred.
> > 
> > We have V4 by default now.  So we can emit
> > 
> >   w1 = *(u8 *)(r2 + 42)
> > 
> > when -mcpu is v3 or higher, or if -malu32 is specified, and
> > 
> >   r1 = *(u8 *)(r2 + 42)
> > 
> > when -mcpu is v2 or lower, or if -mnoalu32 is specified.
> > 
> > Sounds good?
> > 
> > However this implies that the assembler should indeed recognize both
> > forms of instructions.  But note that it will assembly them to the
> > exactly same encoded instruction.  This includes inline asm (remember
> > GCC does not have an integrated assembler.)
> 
> Good point.
> I think we made a mistake in clang.
> We shouldn't be printing
> w1 = *(u8 *)(r2 + 42)
> since such instruction doesn't exist in BPF ISA
> and it's confusing.
> There is only one instruction:
> r1 = *(u8 *)(r2 + 42)
> which is an 8-bit load that zero extends into 64-bit.
> x86 JIT actually implements it as 8-bit load that stores
> into a 32-bit subregister, so it kinda matches w1,
> but that's an implementation detail of the JIT.
> 
> I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42)
> regardless of alu32 or not.
> In gas and clang assembler we can support both w1= and r1=
> flavors for backward compat.
> 

I agree with Alexei (the ... disassembler quirk ... comment is left by me :).
Can dig into clang part of things if this is a consensus.