On Wed, Jan 3, 2024 at 3:26 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > With patch set [1], precision backtracing supports register spill/fill > to/from the stack. The patch [2] allows initial imprecise register spill > with content 0. This is a common case for cpuv3 and lower for > initializing the stack variables with pattern > r1 = 0 > *(u64 *)(r10 - 8) = r1 > and the [2] has demonstrated good verification improvement. > > For cpuv4, the initialization could be > *(u64 *)(r10 - 8) = 0 > The current verifier marks the r10-8 contents with STACK_ZERO. > Similar to [2], let us permit the above insn to behave like > imprecise register spill which can reduce number of verified states. > The change is in function check_stack_write_fixed_off(). > > Before this patch, spilled zero will be marked as STACK_ZERO > which can provide precise values. In check_stack_write_var_off(), > STACK_ZERO will be maintained if writing a const zero > so later it can provide precise values if needed. > > The above handling of '*(u64 *)(r10 - 8) = 0' as a spill > will have issues in check_stack_write_var_off() as the spill > will be converted to STACK_MISC and the precise value 0 > is lost. To fix this issue, if the spill slots with const > zero and the BPF_ST write also with const zero, the spill slots > are preserved, which can later provide precise values > if needed. Without the change in check_stack_write_var_off(), > the test_verifier subtest 'BPF_ST_MEM stack imm zero, variable offset' > will fail. > > I checked cpuv3 and cpuv4 with and without this patch with veristat. > There is no state change for cpuv3 since '*(u64 *)(r10 - 8) = 0' > is only generated with cpuv4. > > For cpuv4: > $ ../veristat -C old.cpuv4.csv new.cpuv4.csv -e file,prog,insns,states -f 'insns_diff!=0' > File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) > ------------------------------------------ ------------------- --------- --------- --------------- ---------- ---------- ------------- > local_storage_bench.bpf.linked3.o get_local 228 168 -60 (-26.32%) 17 14 -3 (-17.65%) > pyperf600_bpf_loop.bpf.linked3.o on_event 6066 4889 -1177 (-19.40%) 403 321 -82 (-20.35%) > test_cls_redirect.bpf.linked3.o cls_redirect 35483 35387 -96 (-0.27%) 2179 2177 -2 (-0.09%) > test_l4lb_noinline.bpf.linked3.o balancer_ingress 4494 4522 +28 (+0.62%) 217 219 +2 (+0.92%) > test_l4lb_noinline_dynptr.bpf.linked3.o balancer_ingress 1432 1455 +23 (+1.61%) 92 94 +2 (+2.17%) > test_xdp_noinline.bpf.linked3.o balancer_ingress_v6 3462 3458 -4 (-0.12%) 216 216 +0 (+0.00%) > verifier_iterating_callbacks.bpf.linked3.o widening 52 41 -11 (-21.15%) 4 3 -1 (-25.00%) > xdp_synproxy_kern.bpf.linked3.o syncookie_tc 12412 11719 -693 (-5.58%) 345 330 -15 (-4.35%) > xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 12478 11794 -684 (-5.48%) 346 331 -15 (-4.34%) > > test_l4lb_noinline and test_l4lb_noinline_dynptr has minor regression, but > pyperf600_bpf_loop and local_storage_bench gets pretty good improvement. > > [1] https://lore.kernel.org/all/20231205184248.1502704-1-andrii@xxxxxxxxxx/ > [2] https://lore.kernel.org/all/20231205184248.1502704-9-andrii@xxxxxxxxxx/ > > Cc: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> > Cc: Martin KaFai Lau <kafai@xxxxxx> > Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx> > --- > kernel/bpf/verifier.c | 21 +++++++++++++++++-- > .../selftests/bpf/progs/verifier_spill_fill.c | 16 +++++++------- > 2 files changed, 27 insertions(+), 10 deletions(-) > > Changelogs: > v1 -> v2: > - Preserve with-const-zero spill if writing is also zero > in check_stack_write_var_off(). > - Add a test with not-8-byte-aligned BPF_ST store. > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index d4e31f61de0e..cfe7a68d90a5 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -4491,7 +4491,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, > if (fls64(reg->umax_value) > BITS_PER_BYTE * size) > state->stack[spi].spilled_ptr.id = 0; > } else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) && > - insn->imm != 0 && env->bpf_capable) { > + env->bpf_capable) { > struct bpf_reg_state fake_reg = {}; > > __mark_reg_known(&fake_reg, insn->imm); > @@ -4613,11 +4613,28 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env, > > /* Variable offset writes destroy any spilled pointers in range. */ > for (i = min_off; i < max_off; i++) { > + struct bpf_reg_state *spill_reg; > u8 new_type, *stype; > - int slot, spi; > + int slot, spi, j; > > slot = -i - 1; > spi = slot / BPF_REG_SIZE; > + > + /* If writing_zero and the the spi slot contains a spill of value 0, > + * maintain the spill type. > + */ > + if (writing_zero && !(i % BPF_REG_SIZE) && is_spilled_scalar_reg(&state->stack[spi])) { > + spill_reg = &state->stack[spi].spilled_ptr; > + if (tnum_is_const(spill_reg->var_off) && spill_reg->var_off.value == 0) { > + for (j = BPF_REG_SIZE; j > 0; j--) { > + if (state->stack[spi].slot_type[j - 1] != STACK_SPILL) > + break; > + } > + i += BPF_REG_SIZE - j - 1; > + continue; > + } > + } > + > stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE]; > mark_stack_slot_scratched(env, spi); > you are skipping this in some situations, which makes debugging harder because we won't see state of all affected slots, so please move it up and set it right after calculating spi > diff --git a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c > index 39fe3372e0e0..d4b3188afe07 100644 > --- a/tools/testing/selftests/bpf/progs/verifier_spill_fill.c > +++ b/tools/testing/selftests/bpf/progs/verifier_spill_fill.c > @@ -495,14 +495,14 @@ char single_byte_buf[1] SEC(".data.single_byte_buf"); > SEC("raw_tp") > __log_level(2) > __success [...]