Re: [PATCH bpf-next] bpf: Relax precision marking in open coded iters and may_goto loop.

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 22 May 2024 14:09:37 -0700

On Wed, May 22, 2024 at 10:33 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Tue, May 21, 2024 at 7:47 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > From: Alexei Starovoitov <ast@xxxxxxxxxx>
> >
> > Motivation for the patch
> > ------------------------
> > Open coded iterators and may_goto is a great mechanism to implement loops,
> > but counted loops are problematic. For example:
> >   for (i = 0; i < 100 && can_loop; i++)
> > is verified as a bounded loop, since i < 100 condition forces the verifier
> > to mark 'i' as precise and loop states at different iterations are not equivalent.
> > That removes the benefit of open coded iterators and may_goto.
> > The workaround is to do:
> >   int zero = 0; /* global or volatile variable */
> >   for (i = zero; i < 100 && can_loop; i++)
> > to hide from the verifier the value of 'i'.
> > It's unnatural and so far users didn't learn such odd programming pattern.
> >
> > This patch aims to improve the verifier to support
> >   for (i = 0; i < 100000 && can_loop; i++)
> > as open coded iter loop (when 'i' doesn't need to be precise).
> >
> > Algorithm
> > ---------
> > First of all:
> >    if (is_may_goto_insn_at(env, insn_idx)) {
> > +          update_loop_entry(cur, &sl->state);
> >            if (states_equal(env, &sl->state, cur, RANGE_WITHIN)) {
> > -                  update_loop_entry(cur, &sl->state);
> >
> > This should be correct, since reaching the same insn should
> > satisfy "if h1 in path" requirement of update_loop_entry() algorithm.
> > It's too conservative to update loop_entry only on a state match.
> >
> > With that the get_loop_entry() can be used to gate is_branch_taken() logic.
> > When 'if (i < 1000)' is done within open coded iterator or in a loop with may_goto
> > don't invoke is_branch_taken() logic.
> > When it's skipped don't do reg_bounds_sanity_check(), since it will surely
> > see range violations.
> >
> > Now, consider progs/iters_task_vma.c that has the following logic:
> >     bpf_for_each(...) {
> >        if (i > 1000)
>
> I'm wondering, maybe we should change rules around handling inequality
> (>, >=, <, <=) comparisons for register(s) that have a constant value
> (or maybe actually any value).
>
> My reasoning is the following. When we have something like this `if (i
> > 1000)` condition, that means that for fallthrough branch whether i
> is 0, or 1, or 2, or whatever doesn't really matter, because the code
> presumably works for any value in [0, 999] range, right? So maybe in
> addition to marking it precise and keeping i's range estimate the
> same, we should extend this range according to inequality condition?
>
> That is, even if we know on first iteration that i is 0 (!precise),
> when we simulate this conditional jump instruction, adjust i's range
> to be [0, 999] (precise) in the fallthrough branch, and [1000,
> U64_MAX] in the true branch?
>
> I.e., make conditional jumps into "range widening" instructions?
>
> Have you thought about this approach? Do you think it will work in
> practice? I'm sure it can't be as simple, but still, worth
> considering. Curious also to hear Eduard's opinion as well, he's dealt
> with this a lot in the past.

I looked into doing exactly that [0, 999] and [1000, max],
then on the next iteration i+=1 insn will adjust it to
[1, 1000], but the next i < 1000 will widen it back to
[0, 999] and the state equivalence will be happy.
But my excitement was short lived, since both gcc and llvm
optimize the loop exit condition to !=
and they do it in the middle end.
Backends cannot influence this optimization.
I don't think it's practical to undo it in the backend.
So most of the loops written as:
for (i = 0; i < 1000; i++)
are compiled as
for (i = 0; i != 1000; i++)
for x86, arm, bpf, etc.

so if there is arr[i] inside the loop the verifier
have to rely on bounded loop logic and check i=0, 1, 2, ... 999
one by one, since nothing else inside the loop
makes the array index bounded.

Another small obstacle is that we don't have [!=const] range,
so i != 100 cannot be widened into [100] and [!=100].
We can add that without too much trouble.
But it won't help this arr[i] case anyway.

We can make i != 100 to be [unknown] and [unknown].
It's bad for arr[i] too, but fine when arr is an arena pointer.
Unfortunately at the time of the 'if' we don't know what comes later.
If the verifier knew that it's only dealing with arena pointers
it could disable precision altogether.

So I went with conditional disable of is_branch_taken + mark_precise and
surprisingly it didn't break any tests.

>
> >           break;
> >
> >        arr[i] = ..;
> >     }
> >
> > Skipping precision mark at if (i > 1000) keeps 'i' imprecise,
> > but arr[i] will mark 'i' as precise anyway, because 'arr' is a map.
> > On the next iteration of the loop the patch does copy_precision()
> > that copies precision markings for top of the loop into next state
> > of the loop. So on the next iteration 'i' will be seen as precise.
> >
> > Hence the key part of the patch:
> > -       pred = is_branch_taken(dst_reg, src_reg, opcode, is_jmp32);
> > +       if (!get_loop_entry(this_branch) || src_reg->precise || dst_reg->precise ||
> > +           (BPF_SRC(insn->code) == BPF_K && insn->imm == 0))
> > +               pred = is_branch_taken(dst_reg, src_reg, opcode, is_jmp32);
> >
> > !get_loop_entry(this_branch) -> if not inside open coded iter keep
> >   existing is_branch_taken() logic, since bounded loop relies on it.
> >
> > src_reg->precise || dst_reg->precise -> if later inside the loop the 'i' was
> >   actually marked as precise then we have to do is_branch_taken() and above
> >   bpf_for_each() will be verified as a bounded loop checking all 1000
> >   iterations. Otherwise we will keep incrementing 'i' and it will eventually
> >   get out of bounds in arr[i] and the verifier will reject such memory access.
> >
> > BPF_SRC(insn->code) == BPF_K && insn->imm == 0 -> if it's a check for
> >   an exit condition from open coded iterator then do is_branch_taken() as well.
> >   Otherwise all open coded iterators won't work.
> >
> > Now consider the same example:
> >     bpf_for_each(...) {
> >        if (i > 1000)
> >           break;
> >
> >        arr[i] = ..;
> >     }
> > but 'arr' is an arena pointer. In this case 'i > 1000' will keep 'i' as
> > imprecise and arr[i] will keep it as imprecise as well.
> > And the whole loop will be verified with open coded iterator logic.
> >
> > Now the following works:
> > -       for (i = zero; i < 1000; i++)
> > +       for (i = 0; i < 100000 && can_loop; i++) {
> >                 htab_update_elem(htab, i, i);
> > +               arr[i] = i; // either arr1 or arr2
> > +       }
> > +char __arena arr1[100000]; /* works */
> > +char arr2[100000]; /* runs into 1M limit */
> >
> > So the users can now use 'for (i = 0;...' pattern everywhere and
> > the verifier will fall back to bounded loop logic and precise 'i'
> > when 'i' is used in map-style memory access.
> > For arena based algorithms 'i' will stay imprecise.
> >
> > -       for (i = zero; i < ARR_SZ && can_loop; i++)
> > +       /* i = 0 is ok here, since i is not used in memory access */
> > +       for (i = 0; i < ARR_SZ && can_loop; i++)
> >                 sum += i;
> > +
> > +       /* have to use i = zero due to arr[i] where arr is not an arena */
> >         for (i = zero; i < ARR_SZ; i++) {
> >                 barrier_var(i);
> >                 sum += i + arr[i];
> >
> > and i = zero workaround in iter_obfuscate_counter() can be removed.
> >

...

> > copy_precision() is a hack, of course, to demonstrate an idea.

btw I think I know of a better way of doing copy_precision().
So don't pay much attention to it.

> >
> > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> > ---
>
> There is a lot to think about here, I'll try to get to this
> today/tomorrow. But for now veristat is concerned about this change
> ([0]):
>
> |File                              |Program
> |Verdict                |States Diff (%)|
> |----------------------------------|---------------------------------|-----------------------|---------------|
> |arena_htab_asm.bpf.o              |arena_htab_asm
> |success                |-80.91 %       |
> |core_kern.bpf.o                   |balancer_ingress
> |success -> failure (!!)|+0.00 %        |
> |dynptr_success.bpf.o              |test_read_write
> |success -> failure (!!)|+0.00 %        |
> |iters.bpf.o                       |checkpoint_states_deletion
> |success -> failure (!!)|+0.00 %        |
> |iters.bpf.o                       |iter_multiple_sequential_loops
> |success                |-11.43 %       |
> |iters.bpf.o                       |iter_obfuscate_counter
> |success                |+30.00 %       |
> |iters.bpf.o                       |iter_pragma_unroll_loop
> |success                |-23.08 %       |
> |iters.bpf.o                       |iter_subprog_iters
> |success                |+1.14 %        |
> |iters.bpf.o                       |loop_state_deps1
> |failure                |+7.14 %        |
> |iters.bpf.o                       |loop_state_deps2
> |failure                |-2.17 %        |
> |iters_task_vma.bpf.o              |iter_task_vma_for_each
> |success -> failure (!!)|+99.20 %       |
> |linked_list.bpf.o                 |global_list_push_pop_multiple
> |success -> failure (!!)|+0.00 %        |
> |linked_list.bpf.o                 |inner_map_list_push_pop_multiple
> |success -> failure (!!)|+0.00 %        |
> |linked_list.bpf.o                 |map_list_push_pop_multiple
> |success -> failure (!!)|+0.00 %        |
> |test_seg6_loop.bpf.o              |__add_egr_x
> |success -> failure (!!)|+0.00 %        |
> |test_sysctl_loop1.bpf.o           |sysctl_tcp_mem
> |success -> failure (!!)|+0.00 %        |
> |test_sysctl_loop2.bpf.o           |sysctl_tcp_mem
> |success -> failure (!!)|+0.00 %        |
> |test_verif_scale2.bpf.o           |balancer_ingress
> |success -> failure (!!)|+0.00 %        |
> |verifier_bounds.bpf.o             |bound_greater_than_u32_max
> |success -> failure (!!)|+0.00 %        |
> |verifier_bounds.bpf.o

That was due to veristat being picky ;)
Extra verbose() in the verifier not gated by log_level
didn't fit in 64k veristat log buffer and ENOSPC turned into failure.

> |crossing_32_bit_signed_boundary_2|success -> failure (!!)|+0.00 %
>    |
> |verifier_bounds.bpf.o
> |crossing_64_bit_signed_boundary_2|success -> failure (!!)|+0.00 %
>    |
> |verifier_iterating_callbacks.bpf.o|cond_break2
> |success                |+75.00 %       |
> |verifier_iterating_callbacks.bpf.o|cond_break3
> |success                |+66.67 %       |
> |verifier_iterating_callbacks.bpf.o|cond_break4
> |success                |+300.00 %      |
> |verifier_iterating_callbacks.bpf.o|cond_break5
> |success                |+266.67 %      |

This is expected, since the tests changed.
In this case 300% regression is from 1 state to 3 states,
and from 10 to 21 states, ...
We should probably print absolute state values in veristat CI.