Re: asm register constraint. Was: [PATCH v2 bpf-next 2/5] bpf: Introduce "volatile compare" macro

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 17 Jan 2024 14:43:59 -0800

On Tue, Jan 16, 2024 at 3:14 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
> On 1/16/24 11:34 AM, Alexei Starovoitov wrote:
>
> > On Tue, Jan 16, 2024 at 11:07 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
> >>
> >> On 1/16/24 9:47 AM, Alexei Starovoitov wrote:
> >>> On Mon, Jan 15, 2024 at 8:33 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> >>>> [0] Updated LLVM
> >>>>       https://github.com/eddyz87/llvm-project/tree/bpf-inline-asm-polymorphic-r
> >>> 1.
> >>> // Use sequence 'wX = wX' if 32-bits ops are available.
> >>> let Predicates = [BPFHasALU32] in {
> >>>
> >>> This is unnecessary conservative.
> >>> wX = wX instructions existed from day one.
> >>> The very first commit of the interpreter and the verifier recognized it.
> >>> No need to gate it by BPFHasALU32.
> >> Actually this is not true from llvm perspective.
> >> wX = wX is available in bpf ISA from day one, but
> >> wX register is only introduced in llvm in 2017
> >> and at the same time alu32 is added to facilitate
> >> its usage.
> > Not quite. At that time we added general support in the verifier
> > for the majority of alu32 insns. The insns worked in the interpreter
> > earlier, but the verifier didn't handle them.
> > While wX=wX was supported by the verifier from the start.
> > So this particular single insn shouldn't be part of alu32 flag
> > It didn't need to be back in 2017 and doesn't need to be now.
>
> Okay, IIUC, currently 32-bit subreg is enabled
> only if alu32 is enabled.
>    if (STI.getHasAlu32())
>      addRegisterClass(MVT::i32, &BPF::GPR32RegClass);
>
> We should unconditionally enable 32-bit subreg with.
>    addRegisterClass(MVT::i32, &BPF::GPR32RegClass);

Makes sense to me.
It should be fine with -mcpu=v1,v2.

> We may need to add Alu32 control in some other
> places which trying to access 32-bit subreg.
> But for wX = wX thing, we do not need Alu32 control
> and the following is okay:
>   def : Pat<(i64 (and (i64 GPR:$src), 0xffffFFFF)),
>            (INSERT_SUBREG
>              (i64 (IMPLICIT_DEF)),
>              (MOV_rr_32 (i32 (EXTRACT_SUBREG GPR:$src, sub_32))),
>              sub_32)>;
>
> I tried with the above change with unconditionally
> doing addRegisterClass(MVT::i32, &BPF::GPR32RegClass).
>
> $ cat t1.c
> unsigned long test1(unsigned long x) {
>          return (unsigned)x;
> }
> unsigned long test2(unsigned x) {
>          return x;
> }
> #if 0
> unsigned test3(unsigned x, unsigned y) {
>          return x + y;
> }
> #endif
> $ clang --target=bpf -mcpu=v1 -O2 -c t1.c && llvm-objdump -d t1.o
>
> t1.o:   file format elf64-bpf
>
> Disassembly of section .text:
>
> 0000000000000000 <test1>:
>         0:       bc 10 00 00 00 00 00 00 w0 = w1
>         1:       95 00 00 00 00 00 00 00 exit
>
> 0000000000000010 <test2>:
>         2:       bc 10 00 00 00 00 00 00 w0 = w1
>         3:       95 00 00 00 00 00 00 00 exit
> $
>
> More changes in BPFInstrInfo.td and possible other
> places need to make the above test3() function work
> at -mcpu=v1.

All makes sense to me.