On Sat, Oct 21, 2023 at 6:08 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > Convergence for open coded iterators is computed in is_state_visited() > by examining states with branches count > 1 and using states_equal(). > states_equal() computes sub-state relation using read and precision marks. > Read and precision marks are propagated from children states, > thus are not guaranteed to be complete inside a loop when branches > count > 1. This could be demonstrated using the following unsafe program: > > 1. r7 = -16 > 2. r6 = bpf_get_prandom_u32() > 3. while (bpf_iter_num_next(&fp[-8])) { > 4. if (r6 != 42) { > 5. r7 = -32 > 6. r6 = bpf_get_prandom_u32() > 7. continue > 8. } > 9. r0 = r10 > 10. r0 += r7 > 11. r8 = *(u64 *)(r0 + 0) > 12. r6 = bpf_get_prandom_u32() > 13. } > > Here verifier would first visit path 1-3, create a checkpoint at 3 > with r7=-16, continue to 4-7,3 with r7=-32. > > Because instructions at 9-12 had not been visitied yet existing > checkpoint at 3 does not have read or precision mark for r7. > Thus states_equal() would return true and verifier would discard > current state, thus unsafe memory access at 11 would not be caught. > > This commit fixes this loophole by introducing exact state comparisons > for iterator convergence logic: > - registers are compared using regs_exact() regardless of read or > precision marks; > - stack slots have to have identical type. > > Unfortunately, this is too strict even for simple programs like below: > > i = 0; > while(iter_next(&it)) > i++; > > At each iteration step i++ would produce a new distinct state and > eventually instruction processing limit would be reached. > > To avoid such behavior speculatively forget (widen) range for > imprecise scalar registers, if those registers were not precise at the > end of the previous iteration and do not match exactly. > > This a conservative heuristic that allows to verify wide range of > programs, however it precludes verification of programs that conjure > an imprecise value on the first loop iteration and use it as precise > on the second. > > Test case iter_task_vma_for_each() presents one of such cases: > > unsigned int seen = 0; > ... > bpf_for_each(task_vma, vma, task, 0) { > if (seen >= 1000) > break; > ... > seen++; > } > > Here clang generates the following code: > > <LBB0_4>: > 24: r8 = r6 ; stash current value of > ... body ... 'seen' > 29: r1 = r10 > 30: r1 += -0x8 > 31: call bpf_iter_task_vma_next > 32: r6 += 0x1 ; seen++; > 33: if r0 == 0x0 goto +0x2 <LBB0_6> ; exit on next() == NULL > 34: r7 += 0x10 > 35: if r8 < 0x3e7 goto -0xc <LBB0_4> ; loop on seen < 1000 > > <LBB0_6>: > ... exit ... > > Note that counter in r6 is copied to r8 and then incremented, > conditional jump is done using r8. Because of this precision mark for > r6 lags one state behind of precision mark on r8 and widening logic > kicks in. > > Adding barrier_var(seen) after conditional is sufficient to force > clang use the same register for both counting and conditional jump. > > This issue was discussed in the thread [1] which was started by > Andrew Werner <awerner32@xxxxxxxxx> demonstrating a similar bug > in callback functions handling. The callbacks would be addressed > in a followup patch. > > [1] https://lore.kernel.org/bpf/97a90da09404c65c8e810cf83c94ac703705dc0e.camel@xxxxxxxxx/ > > Co-developed-by: Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> > Co-developed-by: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> > Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > --- > include/linux/bpf_verifier.h | 1 + > kernel/bpf/verifier.c | 212 +++++++++++++++--- > .../selftests/bpf/progs/iters_task_vma.c | 1 + > 3 files changed, 184 insertions(+), 30 deletions(-) > [...] > +static int widen_imprecise_scalars(struct bpf_verifier_env *env, > + struct bpf_verifier_state *old, > + struct bpf_verifier_state *cur) > +{ > + struct bpf_func_state *fold, *fcur; > + int i, fr; > + > + reset_idmap_scratch(env); > + for (fr = old->curframe; fr >= 0; fr--) { > + fold = old->frame[fr]; > + fcur = cur->frame[fr]; > + > + for (i = 0; i < MAX_BPF_REG; i++) > + maybe_widen_reg(env, > + &fold->regs[i], > + &fcur->regs[i], > + &env->idmap_scratch); > + > + for (i = 0; i < fold->allocated_stack / BPF_REG_SIZE; i++) { > + if (is_spilled_scalar_reg(&fold->stack[i])) should this be !is_spilled_scalar_reg()? but also your should make sure that fcur->stack[i] is also a spilled reg (it could be iter or dynptr, or just hard-coded ZERO or MISC stack slot, so accessing fcur->stack[i].spilled_ptr below might be invalid) > + continue; > + > + maybe_widen_reg(env, > + &fold->stack[i].spilled_ptr, > + &fcur->stack[i].spilled_ptr, > + &env->idmap_scratch); > + } > + } > + return 0; > +} > + [...]