On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi <jose.marchesi@xxxxxxxxxx> wrote: > > > > On 7/25/23 1:09 PM, Jose E. Marchesi wrote: > >> > >>> On 7/25/23 11:56 AM, Jose E. Marchesi wrote: > >>>> > >>>>> On 7/25/23 10:29 AM, Jose E. Marchesi wrote: > >>>>>> Hello Yonghong. > >>>>>> We have noticed that the llvm disassembler uses different notations > >>>>>> for > >>>>>> registers in load and store instructions, depending somehow on the width > >>>>>> of the data being loaded or stored. > >>>>>> For example, this is an excerpt from the assembler-disassembler.s > >>>>>> test > >>>>>> file in llvm: > >>>>>> // Note: For the group below w1 is used as a destination for > >>>>>> sizes u8, u16, u32. > >>>>>> // This is disassembler quirk, but is technically not wrong, as there are > >>>>>> // no different encodings for 'r1 = load' vs 'w1 = load'. > >>>>>> // > >>>>>> // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) > >>>>>> // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) > >>>>>> // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) > >>>>>> // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) > >>>>>> r1 = *(u8*)(r2 + 42) > >>>>>> r1 = *(u16*)(r2 + 42) > >>>>>> r1 = *(u32*)(r2 + 42) > >>>>>> r1 = *(u64*)(r2 + 42) > >>>>>> The comment there clarifies that the usage of wN instead of rN in > >>>>>> the > >>>>>> u8, u16 and u32 cases is a "disassembler quirk". > >>>>>> Anyway, the problem is that it seems that `clang -S' actually emits > >>>>>> these forms with wN. > >>>>>> Is that intended? > >>>>> > >>>>> Yes, this is intended since alu32 mode is enabled where > >>>>> w* registers are used for 8/16/32 bit load. > >>>> So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is > >>>> still > >>>> alu32 mode. Isn't the u{8,16,32} part enough to discriminate? > >>> > >>> What does this 'r1 = 8948 8*9r2 + 0x2a)' mean? > >>> > >>> For u8/u16/u32 loads, if objdump with option to indicate alu32 mode, > >>> then w* register is used. If no alu32 mode for objdump, then r* register > >>> is used. Basically the same insn, disasm is different depending on > >>> alu32 mode or not. u8/u16/u32 is not enough to differentiate. > >> Ok, so the llvm objdump has a switch that tells when to use rN or wN > >> when printing these particular instructions. Thats the "disassembler > >> quirk". To what purpose? Isnt the person passing the command line > >> switch the same person reading the disassembled program? Is this "alu32 > >> mode" more than a cosmetic thing? > >> But what concern us is the assembler, not the disassembler. > >> clang -S (which is not objdump) seems to generate these instructions > >> with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for > >> example) and we assume the output of clang -S is intended to be passed > >> to an assembler, much like with gcc -S. > >> So, should we support both syntaxes as _input_ syntax in the > >> assembler? > > > > Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing > > list), and -mcpu=v3 has alu32 enabled by default. So I think > > gcc can start to emit insn assuming alu32 mode is on by default. > > So > > w1 = *(u8 *)(r2 + 42) > > is preferred. > > We have V4 by default now. So we can emit > > w1 = *(u8 *)(r2 + 42) > > when -mcpu is v3 or higher, or if -malu32 is specified, and > > r1 = *(u8 *)(r2 + 42) > > when -mcpu is v2 or lower, or if -mnoalu32 is specified. > > Sounds good? > > However this implies that the assembler should indeed recognize both > forms of instructions. But note that it will assembly them to the > exactly same encoded instruction. This includes inline asm (remember > GCC does not have an integrated assembler.) Good point. I think we made a mistake in clang. We shouldn't be printing w1 = *(u8 *)(r2 + 42) since such instruction doesn't exist in BPF ISA and it's confusing. There is only one instruction: r1 = *(u8 *)(r2 + 42) which is an 8-bit load that zero extends into 64-bit. x86 JIT actually implements it as 8-bit load that stores into a 32-bit subregister, so it kinda matches w1, but that's an implementation detail of the JIT. I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42) regardless of alu32 or not. In gas and clang assembler we can support both w1= and r1= flavors for backward compat.