On Tue, 2023-07-25 at 21:16 -0700, Yonghong Song wrote: > > On 7/25/23 5:39 PM, Eduard Zingerman wrote: > > On Tue, 2023-07-25 at 17:31 -0700, Alexei Starovoitov wrote: > > > On Tue, Jul 25, 2023 at 3:28 PM Jose E. Marchesi > > > <jose.marchesi@xxxxxxxxxx> wrote: > > > > > > > > > > > > > On 7/25/23 1:09 PM, Jose E. Marchesi wrote: > > > > > > > > > > > > > On 7/25/23 11:56 AM, Jose E. Marchesi wrote: > > > > > > > > > > > > > > > > > On 7/25/23 10:29 AM, Jose E. Marchesi wrote: > > > > > > > > > > Hello Yonghong. > > > > > > > > > > We have noticed that the llvm disassembler uses different notations > > > > > > > > > > for > > > > > > > > > > registers in load and store instructions, depending somehow on the width > > > > > > > > > > of the data being loaded or stored. > > > > > > > > > > For example, this is an excerpt from the assembler-disassembler.s > > > > > > > > > > test > > > > > > > > > > file in llvm: > > > > > > > > > > // Note: For the group below w1 is used as a destination for > > > > > > > > > > sizes u8, u16, u32. > > > > > > > > > > // This is disassembler quirk, but is technically not wrong, as there are > > > > > > > > > > // no different encodings for 'r1 = load' vs 'w1 = load'. > > > > > > > > > > // > > > > > > > > > > // CHECK: 71 21 2a 00 00 00 00 00 w1 = *(u8 *)(r2 + 0x2a) > > > > > > > > > > // CHECK: 69 21 2a 00 00 00 00 00 w1 = *(u16 *)(r2 + 0x2a) > > > > > > > > > > // CHECK: 61 21 2a 00 00 00 00 00 w1 = *(u32 *)(r2 + 0x2a) > > > > > > > > > > // CHECK: 79 21 2a 00 00 00 00 00 r1 = *(u64 *)(r2 + 0x2a) > > > > > > > > > > r1 = *(u8*)(r2 + 42) > > > > > > > > > > r1 = *(u16*)(r2 + 42) > > > > > > > > > > r1 = *(u32*)(r2 + 42) > > > > > > > > > > r1 = *(u64*)(r2 + 42) > > > > > > > > > > The comment there clarifies that the usage of wN instead of rN in > > > > > > > > > > the > > > > > > > > > > u8, u16 and u32 cases is a "disassembler quirk". > > > > > > > > > > Anyway, the problem is that it seems that `clang -S' actually emits > > > > > > > > > > these forms with wN. > > > > > > > > > > Is that intended? > > > > > > > > > > > > > > > > > > Yes, this is intended since alu32 mode is enabled where > > > > > > > > > w* registers are used for 8/16/32 bit load. > > > > > > > > So then why suppporting 'r1 = 8948 8*9r2 + 0x2a)'? The mode is > > > > > > > > still > > > > > > > > alu32 mode. Isn't the u{8,16,32} part enough to discriminate? > > > > > > > > > > > > > > What does this 'r1 = 8948 8*9r2 + 0x2a)' mean? > > > > > > > > > > > > > > For u8/u16/u32 loads, if objdump with option to indicate alu32 mode, > > > > > > > then w* register is used. If no alu32 mode for objdump, then r* register > > > > > > > is used. Basically the same insn, disasm is different depending on > > > > > > > alu32 mode or not. u8/u16/u32 is not enough to differentiate. > > > > > > Ok, so the llvm objdump has a switch that tells when to use rN or wN > > > > > > when printing these particular instructions. Thats the "disassembler > > > > > > quirk". To what purpose? Isnt the person passing the command line > > > > > > switch the same person reading the disassembled program? Is this "alu32 > > > > > > mode" more than a cosmetic thing? > > > > > > But what concern us is the assembler, not the disassembler. > > > > > > clang -S (which is not objdump) seems to generate these instructions > > > > > > with wN (see https://godbolt.org/z/5G433Yvrb for a store instruction for > > > > > > example) and we assume the output of clang -S is intended to be passed > > > > > > to an assembler, much like with gcc -S. > > > > > > So, should we support both syntaxes as _input_ syntax in the > > > > > > assembler? > > > > > > > > > > Considering -mcpu=v3 is recommended cpu flavor (at least in bpf mailing > > > > > list), and -mcpu=v3 has alu32 enabled by default. So I think > > > > > gcc can start to emit insn assuming alu32 mode is on by default. > > > > > So > > > > > w1 = *(u8 *)(r2 + 42) > > > > > is preferred. > > > > > > > > We have V4 by default now. So we can emit > > > > > > > > w1 = *(u8 *)(r2 + 42) > > > > > > > > when -mcpu is v3 or higher, or if -malu32 is specified, and > > > > > > > > r1 = *(u8 *)(r2 + 42) > > > > > > > > when -mcpu is v2 or lower, or if -mnoalu32 is specified. > > > > > > > > Sounds good? > > > > > > > > However this implies that the assembler should indeed recognize both > > > > forms of instructions. But note that it will assembly them to the > > > > exactly same encoded instruction. This includes inline asm (remember > > > > GCC does not have an integrated assembler.) > > > > > > Good point. > > > I think we made a mistake in clang. > > > We shouldn't be printing > > > w1 = *(u8 *)(r2 + 42) > > > since such instruction doesn't exist in BPF ISA > > > and it's confusing. > > > There is only one instruction: > > > r1 = *(u8 *)(r2 + 42) > > > which is an 8-bit load that zero extends into 64-bit. > > > x86 JIT actually implements it as 8-bit load that stores > > > into a 32-bit subregister, so it kinda matches w1, > > > but that's an implementation detail of the JIT. > > > > > > I think both gcc and clang should always print r1 = *(u8 *)(r2 + 42) > > > regardless of alu32 or not. > > > In gas and clang assembler we can support both w1= and r1= > > > flavors for backward compat. > > > > > > > I agree with Alexei (the ... disassembler quirk ... comment is left by me :). > > Can dig into clang part of things if this is a consensus. > > For disassembler, we have stx as well may use w* registers with alu32. > In llvm BPFDisassembler.cpp, we have > > if ((InstClass == BPF_LDX || InstClass == BPF_STX) && > getInstSize(Insn) != BPF_DW && > (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) && > STI.hasFeature(BPF::ALU32)) > Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn, > Address, > this, STI); > else > Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, > this, > STI); > > Maybe we should just do > > Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, > this, STI); > > So we already disassemble based on non-alu32 mode? > Yonghong, Alexei, I have a prototype [1] that consolidates STW/STW32, LDW/LDW32 etc instructions in LLVM BPF backend, thus removing the syntactic difference. I think it simplifies BPFInstrInfo.td a bit but that's up to debate. Should I proceed with it? [1] https://reviews.llvm.org/D156559