Re: [PATCH bpf-next v1 1/2] bpf: force checkpoint when jmp history is too long

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 21 Oct 2024 19:03:14 -0700

On Mon, Oct 21, 2024 at 1:23 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Thu, Oct 17, 2024 at 7:03 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> >
> > A specifically crafted program might trick verifier into growing very
> > long jump history within a single bpf_verifier_state instance.
> > Very long jump history makes mark_chain_precision() unreasonably slow,
> > especially in case if verifier processes a loop.
> >
> > Mitigate this by forcing new state in is_state_visited() in case if
> > current state's jump history is too long.
> >
> > Use same constant as in `skip_inf_loop_check`, but multiply it by
> > arbitrarily chosen value 2 to account for jump history containing not
> > only information about jumps, but also information about stack access.
> >
> > For an example of problematic program consider the code below,
> > w/o this patch the example is processed by verifier for ~15 minutes,
> > before failing to allocate big-enough chunk for jmp_history.
> >
> >     0: r7 = *(u16 *)(r1 +0);"
> >     1: r7 += 0x1ab064b9;"
> >     2: if r7 & 0x702000 goto 1b;
> >     3: r7 &= 0x1ee60e;"
> >     4: r7 += r1;"
> >     5: if r7 s> 0x37d2 goto +0;"
> >     6: r0 = 0;"
> >     7: exit;"
> >
> > Perf profiling shows that most of the time is spent in
> > mark_chain_precision() ~95%.
> >
> > The easiest way to explain why this program causes problems is to
> > apply the following patch:
> >
> >     diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> >     index 0c216e71cec7..4b4823961abe 100644
> >     \--- a/include/linux/bpf.h
> >     \+++ b/include/linux/bpf.h
> >     \@@ -1926,7 +1926,7 @@ struct bpf_array {
> >             };
> >      };
> >
> >     -#define BPF_COMPLEXITY_LIMIT_INSNS      1000000 /* yes. 1M insns */
> >     +#define BPF_COMPLEXITY_LIMIT_INSNS      256 /* yes. 1M insns */
> >      #define MAX_TAIL_CALL_CNT 33
> >
> >      /* Maximum number of loops for bpf_loop and bpf_iter_num.
> >     diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >     index f514247ba8ba..75e88be3bb3e 100644
> >     \--- a/kernel/bpf/verifier.c
> >     \+++ b/kernel/bpf/verifier.c
> >     \@@ -18024,8 +18024,13 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> >      skip_inf_loop_check:
> >                             if (!force_new_state &&
> >                                 env->jmps_processed - env->prev_jmps_processed < 20 &&
> >     -                           env->insn_processed - env->prev_insn_processed < 100)
> >     +                           env->insn_processed - env->prev_insn_processed < 100) {
> >     +                               verbose(env, "is_state_visited: suppressing checkpoint at %d, %d jmps processed, cur->jmp_history_cnt is %d\n",
> >     +                                       env->insn_idx,
> >     +                                       env->jmps_processed - env->prev_jmps_processed,
> >     +                                       cur->jmp_history_cnt);
> >                                     add_new_state = false;
> >     +                       }
> >                             goto miss;
> >                     }
> >                     /* If sl->state is a part of a loop and this loop's entry is a part of
> >     \@@ -18142,6 +18147,9 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> >             if (!add_new_state)
> >                     return 0;
> >
> >     +       verbose(env, "is_state_visited: new checkpoint at %d, resetting env->jmps_processed\n",
> >     +               env->insn_idx);
> >     +
> >             /* There were no equivalent states, remember the current one.
> >              * Technically the current state is not proven to be safe yet,
> >              * but it will either reach outer most bpf_exit (which means it's safe)
> >
> > And observe verification log:
> >
> >     ...
> >     is_state_visited: new checkpoint at 5, resetting env->jmps_processed
> >     5: R1=ctx() R7=ctx(...)
> >     5: (65) if r7 s> 0x37d2 goto pc+0     ; R7=ctx(...)
> >     6: (b7) r0 = 0                        ; R0_w=0
> >     7: (95) exit
> >
> >     from 5 to 6: R1=ctx() R7=ctx(...) R10=fp0
> >     6: R1=ctx() R7=ctx(...) R10=fp0
> >     6: (b7) r0 = 0                        ; R0_w=0
> >     7: (95) exit
> >     is_state_visited: suppressing checkpoint at 1, 3 jmps processed, cur->jmp_history_cnt is 74
> >
> >     from 2 to 1: R1=ctx() R7_w=scalar(...) R10=fp0
> >     1: R1=ctx() R7_w=scalar(...) R10=fp0
> >     1: (07) r7 += 447767737
> >     is_state_visited: suppressing checkpoint at 2, 3 jmps processed, cur->jmp_history_cnt is 75
> >     2: R7_w=scalar(...)
> >     2: (45) if r7 & 0x702000 goto pc-2
> >     ... mark_precise 152 steps for r7 ...
> >     2: R7_w=scalar(...)
> >     is_state_visited: suppressing checkpoint at 1, 4 jmps processed, cur->jmp_history_cnt is 75
> >     1: (07) r7 += 447767737
> >     is_state_visited: suppressing checkpoint at 2, 4 jmps processed, cur->jmp_history_cnt is 76
> >     2: R7_w=scalar(...)
> >     2: (45) if r7 & 0x702000 goto pc-2
> >     ...
> >     BPF program is too large. Processed 257 insn
> >
> > The log output shows that checkpoint at label (1) is never created,
> > because it is suppressed by `skip_inf_loop_check` logic:
> > a. When 'if' at (2) is processed it pushes a state with insn_idx (1)
> >    onto stack and proceeds to (3);
> > b. At (5) checkpoint is created, and this resets
> >    env->{jmps,insns}_processed.
> > c. Verification proceeds and reaches `exit`;
> > d. State saved at step (a) is popped from stack and is_state_visited()
> >    considers if checkpoint needs to be added, but because
> >    env->{jmps,insns}_processed had been just reset at step (b)
> >    the `skip_inf_loop_check` logic forces `add_new_state` to false.
> > e. Verifier proceeds with current state, which slowly accumulates
> >    more and more entries in the jump history.
> >
> > The accumulation of entries in the jump history is a problem because
> > of two factors:
> > - it eventually exhausts memory available for kmalloc() allocation;
> > - mark_chain_precision() traverses the jump history of a state,
> >   meaning that if `r7` is marked precise, verifier would iterate
> >   ever growing jump history until parent state boundary is reached.
> >
> > (note: the log also shows a REG INVARIANTS VIOLATION warning
> >        upon jset processing, but that's another bug to fix).
> >
> > With this patch applied, the example above is rejected by verifier
> > under 1s of time, reaching 1M instructions limit.
> >
> > The program is a simplified reproducer from syzbot report [1].
> > Previous discussion could be found at [2].
> > The patch does not cause any changes in verification performance,
> > when tested on selftests from veristat.cfg and cilium programs taken
> > from [3].
> >
> > [1] https://lore.kernel.org/bpf/670429f6.050a0220.49194.0517.GAE@xxxxxxxxxx/
> > [2] https://lore.kernel.org/bpf/20241009021254.2805446-1-eddyz87@xxxxxxxxx/
> > [3] https://github.com/anakryiko/cilium
> >
> > Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
> > ---
> >  kernel/bpf/verifier.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index f514247ba8ba..f64c831a9278 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -17873,13 +17873,23 @@ static bool iter_active_depths_differ(struct bpf_verifier_state *old, struct bpf
> >         return false;
> >  }
> >
> > +#define MAX_JMPS_PER_STATE 20
> > +
> >  static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> >  {
> >         struct bpf_verifier_state_list *new_sl;
> >         struct bpf_verifier_state_list *sl, **pprev;
> >         struct bpf_verifier_state *cur = env->cur_state, *new, *loop_entry;
> >         int i, j, n, err, states_cnt = 0;
> > -       bool force_new_state = env->test_state_freq || is_force_checkpoint(env, insn_idx);
> > +       bool force_new_state = env->test_state_freq || is_force_checkpoint(env, insn_idx) ||
> > +                              /* - Long jmp history hinders mark_chain_precision performance,
> > +                               *   so force new state if jmp history of current state exceeds
> > +                               *   a threshold.
> > +                               * - Jmp history records not only jumps, but also stack access,
> > +                               *   so keep this constant 2x times the limit imposed on
> > +                               *   env->jmps_processed for loop cases (see skip_inf_loop_check).
> > +                               */
> > +                              cur->jmp_history_cnt > MAX_JMPS_PER_STATE * 2;
>
> this feels like a wrong place to add this heuristic. Just few lines
> below there is:
>
>
> if (env->jmps_processed - env->prev_jmps_processed >= 2 &&
>     env->insn_processed - env->prev_insn_processed >= 8)
>         add_new_state = true;
>
> Please add jmp_history_cnt check here, as it conceptually fits with
> jmps_processed and insn_processed check. It also has a huge comment
> with justification already, so might as well just extend that for
> jmp_history_cnt.

I think adding if (cur->jmp_history_cnt > 20) add_new_state = true;
won't help. It will get back to false.
But I agree that tweaking force_new_state also is not quite clean.

btw the "huge comment" may need to be revised :)
bpf progs today probably look different than they were in 2019.

>
> pw-bot: cr
>
> >         bool add_new_state = force_new_state;
> >         bool force_exact;
> >
> > @@ -18023,7 +18033,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> >                          */
> >  skip_inf_loop_check:
> >                         if (!force_new_state &&
> > -                           env->jmps_processed - env->prev_jmps_processed < 20 &&
> > +                           env->jmps_processed - env->prev_jmps_processed < MAX_JMPS_PER_STATE &&
> >                             env->insn_processed - env->prev_insn_processed < 100)
> >                                 add_new_state = false;
>
> and then this one is logically matching add_new_state = true; case
> above that I mentioned.
>
>
> With these changes, I'd drop * 2 factor for one of the checks. If
> necessary, just bump it to 30 or so, if you are afraid of stack
> accesses. But let's keep it simple with one threshold, if possible?

+1

2/8 heuristic is working together with 20/100.

Instead of tweaking force_new_state earlier, it's better to do:
if (!force_new_state && cur->jmp_history_cnt < N &&
    env->jmps_processed - env->prev_jmps_processed < 20 && ..)
    add_new_state = false;

You're essentially proposing N == 40.
Just add your existing comment next to the check.
# define MAX_JMPS_PER_STATE is imo overkill.