Re: asm register constraint. Was: [PATCH v2 bpf-next 2/5] bpf: Introduce "volatile compare" macro

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/16/24 11:34 AM, Alexei Starovoitov wrote:

On Tue, Jan 16, 2024 at 11:07 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:

On 1/16/24 9:47 AM, Alexei Starovoitov wrote:
On Mon, Jan 15, 2024 at 8:33 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
[0] Updated LLVM
      https://github.com/eddyz87/llvm-project/tree/bpf-inline-asm-polymorphic-r
1.
// Use sequence 'wX = wX' if 32-bits ops are available.
let Predicates = [BPFHasALU32] in {

This is unnecessary conservative.
wX = wX instructions existed from day one.
The very first commit of the interpreter and the verifier recognized it.
No need to gate it by BPFHasALU32.
Actually this is not true from llvm perspective.
wX = wX is available in bpf ISA from day one, but
wX register is only introduced in llvm in 2017
and at the same time alu32 is added to facilitate
its usage.
Not quite. At that time we added general support in the verifier
for the majority of alu32 insns. The insns worked in the interpreter
earlier, but the verifier didn't handle them.
While wX=wX was supported by the verifier from the start.
So this particular single insn shouldn't be part of alu32 flag
It didn't need to be back in 2017 and doesn't need to be now.

Okay, IIUC, currently 32-bit subreg is enabled
only if alu32 is enabled.
  if (STI.getHasAlu32())
    addRegisterClass(MVT::i32, &BPF::GPR32RegClass);

We should unconditionally enable 32-bit subreg with.
  addRegisterClass(MVT::i32, &BPF::GPR32RegClass);

We may need to add Alu32 control in some other
places which trying to access 32-bit subreg.
But for wX = wX thing, we do not need Alu32 control
and the following is okay:
 def : Pat<(i64 (and (i64 GPR:$src), 0xffffFFFF)),
          (INSERT_SUBREG
            (i64 (IMPLICIT_DEF)),
            (MOV_rr_32 (i32 (EXTRACT_SUBREG GPR:$src, sub_32))),
            sub_32)>;

I tried with the above change with unconditionally
doing addRegisterClass(MVT::i32, &BPF::GPR32RegClass).

$ cat t1.c
unsigned long test1(unsigned long x) {
        return (unsigned)x;
}
unsigned long test2(unsigned x) {
        return x;
}
#if 0
unsigned test3(unsigned x, unsigned y) {
        return x + y;
}
#endif
$ clang --target=bpf -mcpu=v1 -O2 -c t1.c && llvm-objdump -d t1.o

t1.o:   file format elf64-bpf

Disassembly of section .text:
0000000000000000 <test1>:
       0:       bc 10 00 00 00 00 00 00 w0 = w1
       1:       95 00 00 00 00 00 00 00 exit
0000000000000010 <test2>:
       2:       bc 10 00 00 00 00 00 00 w0 = w1
       3:       95 00 00 00 00 00 00 00 exit
$

More changes in BPFInstrInfo.td and possible other
places need to make the above test3() function work
at -mcpu=v1.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux