Re: asm register constraint. Was: [PATCH v2 bpf-next 2/5] bpf: Introduce "volatile compare" macro

Yonghong Song <yonghong.song@xxxxxxxxx> · Tue, 16 Jan 2024 15:14:09 -0800

On 1/16/24 11:34 AM, Alexei Starovoitov wrote:

On Tue, Jan 16, 2024 at 11:07 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:

On 1/16/24 9:47 AM, Alexei Starovoitov wrote:
On Mon, Jan 15, 2024 at 8:33 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
[0] Updated LLVM
      https://github.com/eddyz87/llvm-project/tree/bpf-inline-asm-polymorphic-r
1.
// Use sequence 'wX = wX' if 32-bits ops are available.
let Predicates = [BPFHasALU32] in {

This is unnecessary conservative.
wX = wX instructions existed from day one.
The very first commit of the interpreter and the verifier recognized it.
No need to gate it by BPFHasALU32.
Actually this is not true from llvm perspective.
wX = wX is available in bpf ISA from day one, but
wX register is only introduced in llvm in 2017
and at the same time alu32 is added to facilitate
its usage.
Not quite. At that time we added general support in the verifier
for the majority of alu32 insns. The insns worked in the interpreter
earlier, but the verifier didn't handle them.
While wX=wX was supported by the verifier from the start.
So this particular single insn shouldn't be part of alu32 flag
It didn't need to be back in 2017 and doesn't need to be now.

Okay, IIUC, currently 32-bit subreg is enabled
only if alu32 is enabled.
  if (STI.getHasAlu32())
    addRegisterClass(MVT::i32, &BPF::GPR32RegClass);

We should unconditionally enable 32-bit subreg with.
  addRegisterClass(MVT::i32, &BPF::GPR32RegClass);

We may need to add Alu32 control in some other
places which trying to access 32-bit subreg.
But for wX = wX thing, we do not need Alu32 control
and the following is okay:
 def : Pat<(i64 (and (i64 GPR:$src), 0xffffFFFF)),
          (INSERT_SUBREG
            (i64 (IMPLICIT_DEF)),
            (MOV_rr_32 (i32 (EXTRACT_SUBREG GPR:$src, sub_32))),
            sub_32)>;

I tried with the above change with unconditionally
doing addRegisterClass(MVT::i32, &BPF::GPR32RegClass).

$ cat t1.c
unsigned long test1(unsigned long x) {
        return (unsigned)x;
}
unsigned long test2(unsigned x) {
        return x;
}
#if 0
unsigned test3(unsigned x, unsigned y) {
        return x + y;
}
#endif
$ clang --target=bpf -mcpu=v1 -O2 -c t1.c && llvm-objdump -d t1.o

t1.o:   file format elf64-bpf

Disassembly of section .text:

0000000000000000 <test1>:
       0:       bc 10 00 00 00 00 00 00 w0 = w1
       1:       95 00 00 00 00 00 00 00 exit

0000000000000010 <test2>:
       2:       bc 10 00 00 00 00 00 00 w0 = w1
       3:       95 00 00 00 00 00 00 00 exit
$

More changes in BPFInstrInfo.td and possible other
places need to make the above test3() function work
at -mcpu=v1.