Re: [PATCH bpf-next v2 3/7] bpf: exact states comparison for iterator convergence checks

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Sat, 21 Oct 2023 21:16:52 -0700

On Sat, Oct 21, 2023 at 6:08 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> Convergence for open coded iterators is computed in is_state_visited()
> by examining states with branches count > 1 and using states_equal().
> states_equal() computes sub-state relation using read and precision marks.
> Read and precision marks are propagated from children states,
> thus are not guaranteed to be complete inside a loop when branches
> count > 1. This could be demonstrated using the following unsafe program:
>
>      1. r7 = -16
>      2. r6 = bpf_get_prandom_u32()
>      3. while (bpf_iter_num_next(&fp[-8])) {
>      4.   if (r6 != 42) {
>      5.     r7 = -32
>      6.     r6 = bpf_get_prandom_u32()
>      7.     continue
>      8.   }
>      9.   r0 = r10
>     10.   r0 += r7
>     11.   r8 = *(u64 *)(r0 + 0)
>     12.   r6 = bpf_get_prandom_u32()
>     13. }
>
> Here verifier would first visit path 1-3, create a checkpoint at 3
> with r7=-16, continue to 4-7,3 with r7=-32.
>
> Because instructions at 9-12 had not been visitied yet existing
> checkpoint at 3 does not have read or precision mark for r7.
> Thus states_equal() would return true and verifier would discard
> current state, thus unsafe memory access at 11 would not be caught.
>
> This commit fixes this loophole by introducing exact state comparisons
> for iterator convergence logic:
> - registers are compared using regs_exact() regardless of read or
>   precision marks;
> - stack slots have to have identical type.
>
> Unfortunately, this is too strict even for simple programs like below:
>
>     i = 0;
>     while(iter_next(&it))
>       i++;
>
> At each iteration step i++ would produce a new distinct state and
> eventually instruction processing limit would be reached.
>
> To avoid such behavior speculatively forget (widen) range for
> imprecise scalar registers, if those registers were not precise at the
> end of the previous iteration and do not match exactly.
>
> This a conservative heuristic that allows to verify wide range of
> programs, however it precludes verification of programs that conjure
> an imprecise value on the first loop iteration and use it as precise
> on the second.
>
> Test case iter_task_vma_for_each() presents one of such cases:
>
>         unsigned int seen = 0;
>         ...
>         bpf_for_each(task_vma, vma, task, 0) {
>                 if (seen >= 1000)
>                         break;
>                 ...
>                 seen++;
>         }
>
> Here clang generates the following code:
>
> <LBB0_4>:
>       24:       r8 = r6                          ; stash current value of
>                 ... body ...                       'seen'
>       29:       r1 = r10
>       30:       r1 += -0x8
>       31:       call bpf_iter_task_vma_next
>       32:       r6 += 0x1                        ; seen++;
>       33:       if r0 == 0x0 goto +0x2 <LBB0_6>  ; exit on next() == NULL
>       34:       r7 += 0x10
>       35:       if r8 < 0x3e7 goto -0xc <LBB0_4> ; loop on seen < 1000
>
> <LBB0_6>:
>       ... exit ...
>
> Note that counter in r6 is copied to r8 and then incremented,
> conditional jump is done using r8. Because of this precision mark for
> r6 lags one state behind of precision mark on r8 and widening logic
> kicks in.
>
> Adding barrier_var(seen) after conditional is sufficient to force
> clang use the same register for both counting and conditional jump.
>
> This issue was discussed in the thread [1] which was started by
> Andrew Werner <awerner32@xxxxxxxxx> demonstrating a similar bug
> in callback functions handling. The callbacks would be addressed
> in a followup patch.
>
> [1] https://lore.kernel.org/bpf/97a90da09404c65c8e810cf83c94ac703705dc0e.camel@xxxxxxxxx/
>
> Co-developed-by: Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx>
> Co-developed-by: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx>
> Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
> ---
>  include/linux/bpf_verifier.h                  |   1 +
>  kernel/bpf/verifier.c                         | 212 +++++++++++++++---
>  .../selftests/bpf/progs/iters_task_vma.c      |   1 +
>  3 files changed, 184 insertions(+), 30 deletions(-)
>

[...]

> +static int widen_imprecise_scalars(struct bpf_verifier_env *env,
> +                                  struct bpf_verifier_state *old,
> +                                  struct bpf_verifier_state *cur)
> +{
> +       struct bpf_func_state *fold, *fcur;
> +       int i, fr;
> +
> +       reset_idmap_scratch(env);
> +       for (fr = old->curframe; fr >= 0; fr--) {
> +               fold = old->frame[fr];
> +               fcur = cur->frame[fr];
> +
> +               for (i = 0; i < MAX_BPF_REG; i++)
> +                       maybe_widen_reg(env,
> +                                       &fold->regs[i],
> +                                       &fcur->regs[i],
> +                                       &env->idmap_scratch);
> +
> +               for (i = 0; i < fold->allocated_stack / BPF_REG_SIZE; i++) {
> +                       if (is_spilled_scalar_reg(&fold->stack[i]))

should this be !is_spilled_scalar_reg()?

but also your should make sure that fcur->stack[i] is also a spilled
reg (it could be iter or dynptr, or just hard-coded ZERO or MISC stack
slot, so accessing fcur->stack[i].spilled_ptr below might be invalid)

> +                               continue;

> +
> +                       maybe_widen_reg(env,
> +                                       &fold->stack[i].spilled_ptr,
> +                                       &fcur->stack[i].spilled_ptr,
> +                                       &env->idmap_scratch);
> +               }
> +       }
> +       return 0;
> +}
> +

[...]