On Tue, Jan 16, 2024 at 3:14 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > On 1/16/24 11:34 AM, Alexei Starovoitov wrote: > > > On Tue, Jan 16, 2024 at 11:07 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > >> > >> On 1/16/24 9:47 AM, Alexei Starovoitov wrote: > >>> On Mon, Jan 15, 2024 at 8:33 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > >>>> [0] Updated LLVM > >>>> https://github.com/eddyz87/llvm-project/tree/bpf-inline-asm-polymorphic-r > >>> 1. > >>> // Use sequence 'wX = wX' if 32-bits ops are available. > >>> let Predicates = [BPFHasALU32] in { > >>> > >>> This is unnecessary conservative. > >>> wX = wX instructions existed from day one. > >>> The very first commit of the interpreter and the verifier recognized it. > >>> No need to gate it by BPFHasALU32. > >> Actually this is not true from llvm perspective. > >> wX = wX is available in bpf ISA from day one, but > >> wX register is only introduced in llvm in 2017 > >> and at the same time alu32 is added to facilitate > >> its usage. > > Not quite. At that time we added general support in the verifier > > for the majority of alu32 insns. The insns worked in the interpreter > > earlier, but the verifier didn't handle them. > > While wX=wX was supported by the verifier from the start. > > So this particular single insn shouldn't be part of alu32 flag > > It didn't need to be back in 2017 and doesn't need to be now. > > Okay, IIUC, currently 32-bit subreg is enabled > only if alu32 is enabled. > if (STI.getHasAlu32()) > addRegisterClass(MVT::i32, &BPF::GPR32RegClass); > > We should unconditionally enable 32-bit subreg with. > addRegisterClass(MVT::i32, &BPF::GPR32RegClass); Makes sense to me. It should be fine with -mcpu=v1,v2. > We may need to add Alu32 control in some other > places which trying to access 32-bit subreg. > But for wX = wX thing, we do not need Alu32 control > and the following is okay: > def : Pat<(i64 (and (i64 GPR:$src), 0xffffFFFF)), > (INSERT_SUBREG > (i64 (IMPLICIT_DEF)), > (MOV_rr_32 (i32 (EXTRACT_SUBREG GPR:$src, sub_32))), > sub_32)>; > > I tried with the above change with unconditionally > doing addRegisterClass(MVT::i32, &BPF::GPR32RegClass). > > $ cat t1.c > unsigned long test1(unsigned long x) { > return (unsigned)x; > } > unsigned long test2(unsigned x) { > return x; > } > #if 0 > unsigned test3(unsigned x, unsigned y) { > return x + y; > } > #endif > $ clang --target=bpf -mcpu=v1 -O2 -c t1.c && llvm-objdump -d t1.o > > t1.o: file format elf64-bpf > > Disassembly of section .text: > > 0000000000000000 <test1>: > 0: bc 10 00 00 00 00 00 00 w0 = w1 > 1: 95 00 00 00 00 00 00 00 exit > > 0000000000000010 <test2>: > 2: bc 10 00 00 00 00 00 00 w0 = w1 > 3: 95 00 00 00 00 00 00 00 exit > $ > > More changes in BPFInstrInfo.td and possible other > places need to make the above test3() function work > at -mcpu=v1. All makes sense to me.