Re: [PATCH bpf-next v1 1/2] bpf: force checkpoint when jmp history is too long

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 21 Oct 2024 20:19:32 -0700

On Mon, Oct 21, 2024 at 7:03 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Mon, Oct 21, 2024 at 1:23 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Thu, Oct 17, 2024 at 7:03 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> > >
> > > A specifically crafted program might trick verifier into growing very
> > > long jump history within a single bpf_verifier_state instance.
> > > Very long jump history makes mark_chain_precision() unreasonably slow,
> > > especially in case if verifier processes a loop.
> > >
> > > Mitigate this by forcing new state in is_state_visited() in case if
> > > current state's jump history is too long.
> > >
> > > Use same constant as in `skip_inf_loop_check`, but multiply it by
> > > arbitrarily chosen value 2 to account for jump history containing not
> > > only information about jumps, but also information about stack access.
> > >
> > > For an example of problematic program consider the code below,
> > > w/o this patch the example is processed by verifier for ~15 minutes,
> > > before failing to allocate big-enough chunk for jmp_history.
> > >
> > >     0: r7 = *(u16 *)(r1 +0);"
> > >     1: r7 += 0x1ab064b9;"
> > >     2: if r7 & 0x702000 goto 1b;
> > >     3: r7 &= 0x1ee60e;"
> > >     4: r7 += r1;"
> > >     5: if r7 s> 0x37d2 goto +0;"
> > >     6: r0 = 0;"
> > >     7: exit;"
> > >
> > > Perf profiling shows that most of the time is spent in
> > > mark_chain_precision() ~95%.
> > >
> > > The easiest way to explain why this program causes problems is to
> > > apply the following patch:
> > >
> > >     diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > >     index 0c216e71cec7..4b4823961abe 100644
> > >     \--- a/include/linux/bpf.h
> > >     \+++ b/include/linux/bpf.h
> > >     \@@ -1926,7 +1926,7 @@ struct bpf_array {
> > >             };
> > >      };
> > >
> > >     -#define BPF_COMPLEXITY_LIMIT_INSNS      1000000 /* yes. 1M insns */
> > >     +#define BPF_COMPLEXITY_LIMIT_INSNS      256 /* yes. 1M insns */
> > >      #define MAX_TAIL_CALL_CNT 33
> > >
> > >      /* Maximum number of loops for bpf_loop and bpf_iter_num.
> > >     diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > >     index f514247ba8ba..75e88be3bb3e 100644
> > >     \--- a/kernel/bpf/verifier.c
> > >     \+++ b/kernel/bpf/verifier.c
> > >     \@@ -18024,8 +18024,13 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> > >      skip_inf_loop_check:
> > >                             if (!force_new_state &&
> > >                                 env->jmps_processed - env->prev_jmps_processed < 20 &&
> > >     -                           env->insn_processed - env->prev_insn_processed < 100)
> > >     +                           env->insn_processed - env->prev_insn_processed < 100) {
> > >     +                               verbose(env, "is_state_visited: suppressing checkpoint at %d, %d jmps processed, cur->jmp_history_cnt is %d\n",
> > >     +                                       env->insn_idx,
> > >     +                                       env->jmps_processed - env->prev_jmps_processed,
> > >     +                                       cur->jmp_history_cnt);
> > >                                     add_new_state = false;
> > >     +                       }
> > >                             goto miss;
> > >                     }
> > >                     /* If sl->state is a part of a loop and this loop's entry is a part of
> > >     \@@ -18142,6 +18147,9 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> > >             if (!add_new_state)
> > >                     return 0;
> > >
> > >     +       verbose(env, "is_state_visited: new checkpoint at %d, resetting env->jmps_processed\n",
> > >     +               env->insn_idx);
> > >     +
> > >             /* There were no equivalent states, remember the current one.
> > >              * Technically the current state is not proven to be safe yet,
> > >              * but it will either reach outer most bpf_exit (which means it's safe)
> > >
> > > And observe verification log:
> > >
> > >     ...
> > >     is_state_visited: new checkpoint at 5, resetting env->jmps_processed
> > >     5: R1=ctx() R7=ctx(...)
> > >     5: (65) if r7 s> 0x37d2 goto pc+0     ; R7=ctx(...)
> > >     6: (b7) r0 = 0                        ; R0_w=0
> > >     7: (95) exit
> > >
> > >     from 5 to 6: R1=ctx() R7=ctx(...) R10=fp0
> > >     6: R1=ctx() R7=ctx(...) R10=fp0
> > >     6: (b7) r0 = 0                        ; R0_w=0
> > >     7: (95) exit
> > >     is_state_visited: suppressing checkpoint at 1, 3 jmps processed, cur->jmp_history_cnt is 74
> > >
> > >     from 2 to 1: R1=ctx() R7_w=scalar(...) R10=fp0
> > >     1: R1=ctx() R7_w=scalar(...) R10=fp0
> > >     1: (07) r7 += 447767737
> > >     is_state_visited: suppressing checkpoint at 2, 3 jmps processed, cur->jmp_history_cnt is 75
> > >     2: R7_w=scalar(...)
> > >     2: (45) if r7 & 0x702000 goto pc-2
> > >     ... mark_precise 152 steps for r7 ...
> > >     2: R7_w=scalar(...)
> > >     is_state_visited: suppressing checkpoint at 1, 4 jmps processed, cur->jmp_history_cnt is 75
> > >     1: (07) r7 += 447767737
> > >     is_state_visited: suppressing checkpoint at 2, 4 jmps processed, cur->jmp_history_cnt is 76
> > >     2: R7_w=scalar(...)
> > >     2: (45) if r7 & 0x702000 goto pc-2
> > >     ...
> > >     BPF program is too large. Processed 257 insn
> > >
> > > The log output shows that checkpoint at label (1) is never created,
> > > because it is suppressed by `skip_inf_loop_check` logic:
> > > a. When 'if' at (2) is processed it pushes a state with insn_idx (1)
> > >    onto stack and proceeds to (3);
> > > b. At (5) checkpoint is created, and this resets
> > >    env->{jmps,insns}_processed.
> > > c. Verification proceeds and reaches `exit`;
> > > d. State saved at step (a) is popped from stack and is_state_visited()
> > >    considers if checkpoint needs to be added, but because
> > >    env->{jmps,insns}_processed had been just reset at step (b)
> > >    the `skip_inf_loop_check` logic forces `add_new_state` to false.
> > > e. Verifier proceeds with current state, which slowly accumulates
> > >    more and more entries in the jump history.
> > >
> > > The accumulation of entries in the jump history is a problem because
> > > of two factors:
> > > - it eventually exhausts memory available for kmalloc() allocation;
> > > - mark_chain_precision() traverses the jump history of a state,
> > >   meaning that if `r7` is marked precise, verifier would iterate
> > >   ever growing jump history until parent state boundary is reached.
> > >
> > > (note: the log also shows a REG INVARIANTS VIOLATION warning
> > >        upon jset processing, but that's another bug to fix).
> > >
> > > With this patch applied, the example above is rejected by verifier
> > > under 1s of time, reaching 1M instructions limit.
> > >
> > > The program is a simplified reproducer from syzbot report [1].
> > > Previous discussion could be found at [2].
> > > The patch does not cause any changes in verification performance,
> > > when tested on selftests from veristat.cfg and cilium programs taken
> > > from [3].
> > >
> > > [1] https://lore.kernel.org/bpf/670429f6.050a0220.49194.0517.GAE@xxxxxxxxxx/
> > > [2] https://lore.kernel.org/bpf/20241009021254.2805446-1-eddyz87@xxxxxxxxx/
> > > [3] https://github.com/anakryiko/cilium
> > >
> > > Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
> > > ---
> > >  kernel/bpf/verifier.c | 14 ++++++++++++--
> > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index f514247ba8ba..f64c831a9278 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -17873,13 +17873,23 @@ static bool iter_active_depths_differ(struct bpf_verifier_state *old, struct bpf
> > >         return false;
> > >  }
> > >
> > > +#define MAX_JMPS_PER_STATE 20
> > > +
> > >  static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> > >  {
> > >         struct bpf_verifier_state_list *new_sl;
> > >         struct bpf_verifier_state_list *sl, **pprev;
> > >         struct bpf_verifier_state *cur = env->cur_state, *new, *loop_entry;
> > >         int i, j, n, err, states_cnt = 0;
> > > -       bool force_new_state = env->test_state_freq || is_force_checkpoint(env, insn_idx);
> > > +       bool force_new_state = env->test_state_freq || is_force_checkpoint(env, insn_idx) ||
> > > +                              /* - Long jmp history hinders mark_chain_precision performance,
> > > +                               *   so force new state if jmp history of current state exceeds
> > > +                               *   a threshold.
> > > +                               * - Jmp history records not only jumps, but also stack access,
> > > +                               *   so keep this constant 2x times the limit imposed on
> > > +                               *   env->jmps_processed for loop cases (see skip_inf_loop_check).
> > > +                               */
> > > +                              cur->jmp_history_cnt > MAX_JMPS_PER_STATE * 2;
> >
> > this feels like a wrong place to add this heuristic. Just few lines
> > below there is:
> >
> >
> > if (env->jmps_processed - env->prev_jmps_processed >= 2 &&
> >     env->insn_processed - env->prev_insn_processed >= 8)
> >         add_new_state = true;
> >
> > Please add jmp_history_cnt check here, as it conceptually fits with
> > jmps_processed and insn_processed check. It also has a huge comment
> > with justification already, so might as well just extend that for
> > jmp_history_cnt.
>
> I think adding if (cur->jmp_history_cnt > 20) add_new_state = true;
> won't help. It will get back to false.
> But I agree that tweaking force_new_state also is not quite clean.
>
> btw the "huge comment" may need to be revised :)
> bpf progs today probably look different than they were in 2019.
>
> >
> > pw-bot: cr
> >
> > >         bool add_new_state = force_new_state;
> > >         bool force_exact;
> > >
> > > @@ -18023,7 +18033,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> > >                          */
> > >  skip_inf_loop_check:
> > >                         if (!force_new_state &&
> > > -                           env->jmps_processed - env->prev_jmps_processed < 20 &&
> > > +                           env->jmps_processed - env->prev_jmps_processed < MAX_JMPS_PER_STATE &&
> > >                             env->insn_processed - env->prev_insn_processed < 100)
> > >                                 add_new_state = false;
> >
> > and then this one is logically matching add_new_state = true; case
> > above that I mentioned.
> >
> >
> > With these changes, I'd drop * 2 factor for one of the checks. If
> > necessary, just bump it to 30 or so, if you are afraid of stack
> > accesses. But let's keep it simple with one threshold, if possible?
>
> +1
>
> 2/8 heuristic is working together with 20/100.
>
> Instead of tweaking force_new_state earlier, it's better to do:
> if (!force_new_state && cur->jmp_history_cnt < N &&
>     env->jmps_processed - env->prev_jmps_processed < 20 && ..)
>     add_new_state = false;

Yep, I actually had *exactly* this in mind, ack.

Basically all these heuristics are expressing the idea of doing just
the right amount of work between checkpoints, not too little, not too
much. Jump history size (accumulated in the current state) will be a
third  measure of "enough useful work", basically.

>
> You're essentially proposing N == 40.
> Just add your existing comment next to the check.
> # define MAX_JMPS_PER_STATE is imo overkill.