Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler

Alexei Starovoitov <ast@xxxxxxxx> · Wed, 25 Jan 2023 21:49:21 -0800

On 1/13/23 8:47 AM, Eduard Zingerman wrote:
        So the patch extends this logic to also handle BPF_ST.
The patch 3 is necessary to land before llvm starts generating 'st'
for ctx access.
That's clear, but I'm missing why patch 1 is necessary.
Sure, it's making the verifier understand scalar spills with 'st' and
makes 'st' equivalent to 'stx', but I'm missing why it's necessary.
What kind of programs fail to be verified when llvm starts generating 'st' ?
I should have added an example to the summary. There are a few
test_prog tests that fail w/o this patch, namely atomic_bounds,
dynptr, for_each, xdp_noinline, xdp_synproxy.

Here is how atomic_bounds looks:

   SEC("fentry/bpf_fentry_test1")
   int BPF_PROG(sub, int x)
   {
           int a = 0;
           int b = __sync_fetch_and_add(&a, 1);
           /* b is certainly 0 here. Can the verifier tell? */
           while (b)
                   continue;
           return 0;
   }

     Compiled w/o BPF_ST                  Compiled with BPF_ST

   <sub>:                               <sub>:
     w1 = 0x0
     *(u32 *)(r10 - 0x4) = r1             *(u32 *)(r10 - 0x4) = 0x0
     w1 = 0x1                             w1 = 0x1
     lock *(u32 *)(r10 - 0x4) += r1       lock *(u32 *)(r10 - 0x4) += r1
     if w1 == 0x0 goto +0x1 <LBB0_2>      if w1 == 0x0 goto +0x1 <LBB0_2>

   <LBB0_1>:                            <LBB0_1>:
     goto -0x1 <LBB0_1>                   goto -0x1 <LBB0_1>

   <LBB0_2>:                            <LBB0_2>:
     w0 = 0x0                             w0 = 0x0
     exit                                 exit

When compiled with BPF_ST and verified w/o the patch #1 verification log
looks as follows:

   0: R1=ctx(off=0,imm=0) R10=fp0
   0: (62) *(u32 *)(r10 -4) = 0          ; R10=fp0 fp-8=mmmm????
   1: (b4) w1 = 1                        ; R1_w=1
   2: (c3) r1 = atomic_fetch_add((u32 *)(r10 -4), r1)    ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=mmmm????
   3: (16) if w1 == 0x0 goto pc+1        ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
   4: (05) goto pc-1
   4: (05) goto pc-1
   4: (05) goto pc-1
   4: (05) goto pc-1
   infinite loop detected at insn 4

When compiled w/o BPF_ST and verified w/o the patch #1 verification log
looks as follows:

   func#0 @0
   reg type unsupported for arg#0 function sub#5
   0: R1=ctx(off=0,imm=0) R10=fp0
   0: (b4) w1 = 0                        ; R1_w=0
   1: (63) *(u32 *)(r10 -4) = r1
   last_idx 1 first_idx 0
   regs=2 stack=0 before 0: (b4) w1 = 0
   2: R1_w=0 R10=fp0 fp-8=0000????
   2: (b4) w1 = 1                        ; R1_w=1
   ; int b = __sync_fetch_and_add(&a, 1);
   3: (c3) r1 = atomic_fetch_add((u32 *)(r10 -4), r1)    ; R1_w=P0 R10=fp0 fp-8=mmmm????
   4: (16) if w1 == 0x0 goto pc+1
   6: R1_w=P0
   6: (b4) w0 = 0                        ; R0_w=0
   7: (95) exit
   processed 7 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

The difference comes from the way zero write to `r10-4` is processed,
with BPF_ST it is tracked as `fp-8=mmmm????` after write, without BPF_ST
it is tracked as `fp-8=0000???? after` write.

Which is caused by the way `check_stack_write_fixed_off()` is written.
For the register spills it either saves the complete register state or
STACK_ZERO if register is known to be zero. However, for the BPF_ST it
just saves STACK_MISC. Hence, the patch #1.
Thank you for explaining.
Though llvm changes are not ready, let's land the required
kernel changes now.
Please craft asm test cases for the issues you've seen with BPF_ST.