On Wed, 2023-05-31 at 19:05 -0700, Alexei Starovoitov wrote: > [...] > > Suppose that current verification path is 1-7: > > - On a way down 1-6 r7 will not be marked as precise, because > > condition (r7 > X) is not predictable (see check_cond_jmp_op()); > > - When (7) is reached mark_chain_precision() will start moving up > > marking the following registers as precise: > > > > 4: if (r6 > r7) goto +1 ; r6, r7 > > 5: r7 = r6 ; r6 > > 6: if (r7 > X) goto ... ; r6 > > 7: r9 += r6 ; r6 > > > > - Thus, if checkpoint is created for (6) r7 would be marked as read, > > but will not be marked as precise. > > > > Next, suppose that jump from 4 to 6 is verified and checkpoint for (6) > > is considered: > > - r6 is not precise, so check_ids() is not called for it and it is not > > added to idmap; > > - r7 is precise, so check_ids() is called for it, but it is a sole > > register in the idmap; > > typos in above? > r6 is precise and r7 is not precise. Yes, it should be the other way around in the description: r6 precise, r7 not precise. Sorry for confusion. > > - States are considered equal. > > > > Here is the log (I added a few prints for states cache comparison): > > > > from 10 to 13: safe > > steq hit 10, cur: > > R0=scalar(id=2) R6=scalar(id=2) R7=scalar(id=1) R9=fp-8 R10=fp0 fp-8=00000000 > > steq hit 10, old: > > R6_rD=Pscalar(id=2) R7_rwD=scalar(id=2) R9_rD=fp-8 R10=fp0 fp-8_rD=00000000 > > the log is correct, thouhg. > r6_old = Pscalar which will go through check_ids() successfully and both are unbounded. > r7_old is not precise. different id-s don't matter and different ranges don't matter. > > As another potential fix... > can we mark_chain_precision() right at the time of R1 = R2 when we do > src_reg->id = ++env->id_gen > and copy_register_state(); > for both regs? This won't help, e.g. for the original example precise markings would be: 4: if (r6 > r7) goto +1 ; r6, r7 5: r7 = r6 ; r6, r7 6: if (r7 > X) goto ... ; r6 <-- mark for r7 is still missing 7: r9 += r6 ; r6 What might help is to call mark_chain_precision() from find_equal_scalars(), but I expect this to be very expensive. > I think > if (rold->precise && !check_ids(rold->id, rcur->id, idmap)) > would be good property to have. > I don't like u32_hashset either. > It's more or less saying that scalar id-s are incompatible with precision. > > I hope we don't need to do: > + u32 reg_ids[MAX_CALL_FRAMES]; > for backtracking either. > Hacking id-s into jmp history is equally bad. > > Let's figure out a minimal fix. Solution discussed with Andrii yesterday seems to work. There is still a performance regression, but much less severe: $ ./veristat -e file,prog,states -f "states_pct>5" -C master-baseline.log current.log File Program States (A) States (B) States (DIFF) ------------------------ ------------------------------ ---------- ---------- ----------------- bpf_host.o cil_to_host 188 198 +10 (+5.32%) bpf_host.o tail_handle_ipv4_from_host 225 243 +18 (+8.00%) bpf_host.o tail_ipv6_host_policy_ingress 98 104 +6 (+6.12%) bpf_xdp.o tail_handle_nat_fwd_ipv6 648 806 +158 (+24.38%) bpf_xdp.o tail_lb_ipv4 2491 2930 +439 (+17.62%) bpf_xdp.o tail_nodeport_nat_egress_ipv4 749 868 +119 (+15.89%) bpf_xdp.o tail_nodeport_nat_ingress_ipv4 375 477 +102 (+27.20%) bpf_xdp.o tail_rev_nodeport_lb4 398 486 +88 (+22.11%) loop6.bpf.o trace_virtqueue_add_sgs 226 251 +25 (+11.06%) pyperf600.bpf.o on_event 22200 45095 +22895 (+103.13%) pyperf600_nounroll.bpf.o on_event 34169 37235 +3066 (+8.97%) I need to add a bunch of tests and take a look at pyperf600.bpf.o before submitting next patch-set version.