Re: [PATCH bpf-next v1 1/2] bpf: force checkpoint when jmp history is too long

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 21 Oct 2024 19:18:07 -0700

On Thu, Oct 17, 2024 at 7:03 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> A specifically crafted program might trick verifier into growing very
> long jump history within a single bpf_verifier_state instance.
> Very long jump history makes mark_chain_precision() unreasonably slow,
> especially in case if verifier processes a loop.
>
> Mitigate this by forcing new state in is_state_visited() in case if
> current state's jump history is too long.
>
> Use same constant as in `skip_inf_loop_check`, but multiply it by
> arbitrarily chosen value 2 to account for jump history containing not
> only information about jumps, but also information about stack access.
>
> For an example of problematic program consider the code below,
> w/o this patch the example is processed by verifier for ~15 minutes,
> before failing to allocate big-enough chunk for jmp_history.
>
>     0: r7 = *(u16 *)(r1 +0);"
>     1: r7 += 0x1ab064b9;"
>     2: if r7 & 0x702000 goto 1b;
>     3: r7 &= 0x1ee60e;"
>     4: r7 += r1;"
>     5: if r7 s> 0x37d2 goto +0;"
>     6: r0 = 0;"
>     7: exit;"
>
> Perf profiling shows that most of the time is spent in
> mark_chain_precision() ~95%.
>
> The easiest way to explain why this program causes problems is to
> apply the following patch:
>
>     diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>     index 0c216e71cec7..4b4823961abe 100644
>     \--- a/include/linux/bpf.h
>     \+++ b/include/linux/bpf.h
>     \@@ -1926,7 +1926,7 @@ struct bpf_array {
>             };
>      };
>
>     -#define BPF_COMPLEXITY_LIMIT_INSNS      1000000 /* yes. 1M insns */
>     +#define BPF_COMPLEXITY_LIMIT_INSNS      256 /* yes. 1M insns */
>      #define MAX_TAIL_CALL_CNT 33
>
>      /* Maximum number of loops for bpf_loop and bpf_iter_num.
>     diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>     index f514247ba8ba..75e88be3bb3e 100644
>     \--- a/kernel/bpf/verifier.c
>     \+++ b/kernel/bpf/verifier.c
>     \@@ -18024,8 +18024,13 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
>      skip_inf_loop_check:
>                             if (!force_new_state &&
>                                 env->jmps_processed - env->prev_jmps_processed < 20 &&
>     -                           env->insn_processed - env->prev_insn_processed < 100)
>     +                           env->insn_processed - env->prev_insn_processed < 100) {
>     +                               verbose(env, "is_state_visited: suppressing checkpoint at %d, %d jmps processed, cur->jmp_history_cnt is %d\n",
>     +                                       env->insn_idx,
>     +                                       env->jmps_processed - env->prev_jmps_processed,
>     +                                       cur->jmp_history_cnt);
>                                     add_new_state = false;
>     +                       }
>                             goto miss;
>                     }
>                     /* If sl->state is a part of a loop and this loop's entry is a part of
>     \@@ -18142,6 +18147,9 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
>             if (!add_new_state)
>                     return 0;
>
>     +       verbose(env, "is_state_visited: new checkpoint at %d, resetting env->jmps_processed\n",
>     +               env->insn_idx);
>     +
>             /* There were no equivalent states, remember the current one.
>              * Technically the current state is not proven to be safe yet,
>              * but it will either reach outer most bpf_exit (which means it's safe)
>
> And observe verification log:
>
>     ...
>     is_state_visited: new checkpoint at 5, resetting env->jmps_processed
>     5: R1=ctx() R7=ctx(...)
>     5: (65) if r7 s> 0x37d2 goto pc+0     ; R7=ctx(...)
>     6: (b7) r0 = 0                        ; R0_w=0
>     7: (95) exit
>
>     from 5 to 6: R1=ctx() R7=ctx(...) R10=fp0
>     6: R1=ctx() R7=ctx(...) R10=fp0
>     6: (b7) r0 = 0                        ; R0_w=0
>     7: (95) exit
>     is_state_visited: suppressing checkpoint at 1, 3 jmps processed, cur->jmp_history_cnt is 74
>
>     from 2 to 1: R1=ctx() R7_w=scalar(...) R10=fp0
>     1: R1=ctx() R7_w=scalar(...) R10=fp0
>     1: (07) r7 += 447767737
>     is_state_visited: suppressing checkpoint at 2, 3 jmps processed, cur->jmp_history_cnt is 75
>     2: R7_w=scalar(...)
>     2: (45) if r7 & 0x702000 goto pc-2
>     ... mark_precise 152 steps for r7 ...
>     2: R7_w=scalar(...)
>     is_state_visited: suppressing checkpoint at 1, 4 jmps processed, cur->jmp_history_cnt is 75
>     1: (07) r7 += 447767737
>     is_state_visited: suppressing checkpoint at 2, 4 jmps processed, cur->jmp_history_cnt is 76
>     2: R7_w=scalar(...)
>     2: (45) if r7 & 0x702000 goto pc-2
>     ...
>     BPF program is too large. Processed 257 insn
>
> The log output shows that checkpoint at label (1) is never created,
> because it is suppressed by `skip_inf_loop_check` logic:
> a. When 'if' at (2) is processed it pushes a state with insn_idx (1)
>    onto stack and proceeds to (3);
> b. At (5) checkpoint is created, and this resets
>    env->{jmps,insns}_processed.
> c. Verification proceeds and reaches `exit`;
> d. State saved at step (a) is popped from stack and is_state_visited()
>    considers if checkpoint needs to be added, but because
>    env->{jmps,insns}_processed had been just reset at step (b)
>    the `skip_inf_loop_check` logic forces `add_new_state` to false.
> e. Verifier proceeds with current state, which slowly accumulates
>    more and more entries in the jump history.

I'm still not sure why it grew to thousands of entries in jmp_history.
Looking at the above trace jmps_processed grows 1 to 1 with jmp_history_cnt.
Also cur->jmp_history_cnt is reset to zero at the same time as
jmps processed.
So in the above test 75 vs 4 difference came from jmp_history
entries that were there before the loop ?