Re: [PATCH bpf-next] bpf: Relax precision marking in open coded iters and may_goto loop.

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 22 May 2024 10:33:37 -0700

On Tue, May 21, 2024 at 7:47 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> From: Alexei Starovoitov <ast@xxxxxxxxxx>
>
> Motivation for the patch
> ------------------------
> Open coded iterators and may_goto is a great mechanism to implement loops,
> but counted loops are problematic. For example:
>   for (i = 0; i < 100 && can_loop; i++)
> is verified as a bounded loop, since i < 100 condition forces the verifier
> to mark 'i' as precise and loop states at different iterations are not equivalent.
> That removes the benefit of open coded iterators and may_goto.
> The workaround is to do:
>   int zero = 0; /* global or volatile variable */
>   for (i = zero; i < 100 && can_loop; i++)
> to hide from the verifier the value of 'i'.
> It's unnatural and so far users didn't learn such odd programming pattern.
>
> This patch aims to improve the verifier to support
>   for (i = 0; i < 100000 && can_loop; i++)
> as open coded iter loop (when 'i' doesn't need to be precise).
>
> Algorithm
> ---------
> First of all:
>    if (is_may_goto_insn_at(env, insn_idx)) {
> +          update_loop_entry(cur, &sl->state);
>            if (states_equal(env, &sl->state, cur, RANGE_WITHIN)) {
> -                  update_loop_entry(cur, &sl->state);
>
> This should be correct, since reaching the same insn should
> satisfy "if h1 in path" requirement of update_loop_entry() algorithm.
> It's too conservative to update loop_entry only on a state match.
>
> With that the get_loop_entry() can be used to gate is_branch_taken() logic.
> When 'if (i < 1000)' is done within open coded iterator or in a loop with may_goto
> don't invoke is_branch_taken() logic.
> When it's skipped don't do reg_bounds_sanity_check(), since it will surely
> see range violations.
>
> Now, consider progs/iters_task_vma.c that has the following logic:
>     bpf_for_each(...) {
>        if (i > 1000)

I'm wondering, maybe we should change rules around handling inequality
(>, >=, <, <=) comparisons for register(s) that have a constant value
(or maybe actually any value).

My reasoning is the following. When we have something like this `if (i
> 1000)` condition, that means that for fallthrough branch whether i
is 0, or 1, or 2, or whatever doesn't really matter, because the code
presumably works for any value in [0, 999] range, right? So maybe in
addition to marking it precise and keeping i's range estimate the
same, we should extend this range according to inequality condition?

That is, even if we know on first iteration that i is 0 (!precise),
when we simulate this conditional jump instruction, adjust i's range
to be [0, 999] (precise) in the fallthrough branch, and [1000,
U64_MAX] in the true branch?

I.e., make conditional jumps into "range widening" instructions?

Have you thought about this approach? Do you think it will work in
practice? I'm sure it can't be as simple, but still, worth
considering. Curious also to hear Eduard's opinion as well, he's dealt
with this a lot in the past.

>           break;
>
>        arr[i] = ..;
>     }
>
> Skipping precision mark at if (i > 1000) keeps 'i' imprecise,
> but arr[i] will mark 'i' as precise anyway, because 'arr' is a map.
> On the next iteration of the loop the patch does copy_precision()
> that copies precision markings for top of the loop into next state
> of the loop. So on the next iteration 'i' will be seen as precise.
>
> Hence the key part of the patch:
> -       pred = is_branch_taken(dst_reg, src_reg, opcode, is_jmp32);
> +       if (!get_loop_entry(this_branch) || src_reg->precise || dst_reg->precise ||
> +           (BPF_SRC(insn->code) == BPF_K && insn->imm == 0))
> +               pred = is_branch_taken(dst_reg, src_reg, opcode, is_jmp32);
>
> !get_loop_entry(this_branch) -> if not inside open coded iter keep
>   existing is_branch_taken() logic, since bounded loop relies on it.
>
> src_reg->precise || dst_reg->precise -> if later inside the loop the 'i' was
>   actually marked as precise then we have to do is_branch_taken() and above
>   bpf_for_each() will be verified as a bounded loop checking all 1000
>   iterations. Otherwise we will keep incrementing 'i' and it will eventually
>   get out of bounds in arr[i] and the verifier will reject such memory access.
>
> BPF_SRC(insn->code) == BPF_K && insn->imm == 0 -> if it's a check for
>   an exit condition from open coded iterator then do is_branch_taken() as well.
>   Otherwise all open coded iterators won't work.
>
> Now consider the same example:
>     bpf_for_each(...) {
>        if (i > 1000)
>           break;
>
>        arr[i] = ..;
>     }
> but 'arr' is an arena pointer. In this case 'i > 1000' will keep 'i' as
> imprecise and arr[i] will keep it as imprecise as well.
> And the whole loop will be verified with open coded iterator logic.
>
> Now the following works:
> -       for (i = zero; i < 1000; i++)
> +       for (i = 0; i < 100000 && can_loop; i++) {
>                 htab_update_elem(htab, i, i);
> +               arr[i] = i; // either arr1 or arr2
> +       }
> +char __arena arr1[100000]; /* works */
> +char arr2[100000]; /* runs into 1M limit */
>
> So the users can now use 'for (i = 0;...' pattern everywhere and
> the verifier will fall back to bounded loop logic and precise 'i'
> when 'i' is used in map-style memory access.
> For arena based algorithms 'i' will stay imprecise.
>
> -       for (i = zero; i < ARR_SZ && can_loop; i++)
> +       /* i = 0 is ok here, since i is not used in memory access */
> +       for (i = 0; i < ARR_SZ && can_loop; i++)
>                 sum += i;
> +
> +       /* have to use i = zero due to arr[i] where arr is not an arena */
>         for (i = zero; i < ARR_SZ; i++) {
>                 barrier_var(i);
>                 sum += i + arr[i];
>
> and i = zero workaround in iter_obfuscate_counter() can be removed.
>
> copy_precision() is a hack, of course, to demonstrate an idea.
>
> Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> ---

There is a lot to think about here, I'll try to get to this
today/tomorrow. But for now veristat is concerned about this change
([0]):

|File                              |Program
|Verdict                |States Diff (%)|
|----------------------------------|---------------------------------|-----------------------|---------------|
|arena_htab_asm.bpf.o              |arena_htab_asm
|success                |-80.91 %       |
|core_kern.bpf.o                   |balancer_ingress
|success -> failure (!!)|+0.00 %        |
|dynptr_success.bpf.o              |test_read_write
|success -> failure (!!)|+0.00 %        |
|iters.bpf.o                       |checkpoint_states_deletion
|success -> failure (!!)|+0.00 %        |
|iters.bpf.o                       |iter_multiple_sequential_loops
|success                |-11.43 %       |
|iters.bpf.o                       |iter_obfuscate_counter
|success                |+30.00 %       |
|iters.bpf.o                       |iter_pragma_unroll_loop
|success                |-23.08 %       |
|iters.bpf.o                       |iter_subprog_iters
|success                |+1.14 %        |
|iters.bpf.o                       |loop_state_deps1
|failure                |+7.14 %        |
|iters.bpf.o                       |loop_state_deps2
|failure                |-2.17 %        |
|iters_task_vma.bpf.o              |iter_task_vma_for_each
|success -> failure (!!)|+99.20 %       |
|linked_list.bpf.o                 |global_list_push_pop_multiple
|success -> failure (!!)|+0.00 %        |
|linked_list.bpf.o                 |inner_map_list_push_pop_multiple
|success -> failure (!!)|+0.00 %        |
|linked_list.bpf.o                 |map_list_push_pop_multiple
|success -> failure (!!)|+0.00 %        |
|test_seg6_loop.bpf.o              |__add_egr_x
|success -> failure (!!)|+0.00 %        |
|test_sysctl_loop1.bpf.o           |sysctl_tcp_mem
|success -> failure (!!)|+0.00 %        |
|test_sysctl_loop2.bpf.o           |sysctl_tcp_mem
|success -> failure (!!)|+0.00 %        |
|test_verif_scale2.bpf.o           |balancer_ingress
|success -> failure (!!)|+0.00 %        |
|verifier_bounds.bpf.o             |bound_greater_than_u32_max
|success -> failure (!!)|+0.00 %        |
|verifier_bounds.bpf.o
|crossing_32_bit_signed_boundary_2|success -> failure (!!)|+0.00 %
   |
|verifier_bounds.bpf.o
|crossing_64_bit_signed_boundary_2|success -> failure (!!)|+0.00 %
   |
|verifier_iterating_callbacks.bpf.o|cond_break2
|success                |+75.00 %       |
|verifier_iterating_callbacks.bpf.o|cond_break3
|success                |+66.67 %       |
|verifier_iterating_callbacks.bpf.o|cond_break4
|success                |+300.00 %      |
|verifier_iterating_callbacks.bpf.o|cond_break5
|success                |+266.67 %      |

  [0] https://github.com/kernel-patches/bpf/actions/runs/9184700207/job/25257587541

>  kernel/bpf/verifier.c                         | 94 +++++++++++++++++--
>  .../testing/selftests/bpf/progs/arena_htab.c  | 11 ++-
>  tools/testing/selftests/bpf/progs/iters.c     | 18 +---
>  .../bpf/progs/verifier_iterating_callbacks.c  | 17 ++--
>  4 files changed, 112 insertions(+), 28 deletions(-)
>

[...]