On Thu, Oct 17, 2024 at 7:03 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > A specifically crafted program might trick verifier into growing very > long jump history within a single bpf_verifier_state instance. > Very long jump history makes mark_chain_precision() unreasonably slow, > especially in case if verifier processes a loop. > > Mitigate this by forcing new state in is_state_visited() in case if > current state's jump history is too long. > > Use same constant as in `skip_inf_loop_check`, but multiply it by > arbitrarily chosen value 2 to account for jump history containing not > only information about jumps, but also information about stack access. > > For an example of problematic program consider the code below, > w/o this patch the example is processed by verifier for ~15 minutes, > before failing to allocate big-enough chunk for jmp_history. > > 0: r7 = *(u16 *)(r1 +0);" > 1: r7 += 0x1ab064b9;" > 2: if r7 & 0x702000 goto 1b; > 3: r7 &= 0x1ee60e;" > 4: r7 += r1;" > 5: if r7 s> 0x37d2 goto +0;" > 6: r0 = 0;" > 7: exit;" > > Perf profiling shows that most of the time is spent in > mark_chain_precision() ~95%. > > The easiest way to explain why this program causes problems is to > apply the following patch: > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 0c216e71cec7..4b4823961abe 100644 > \--- a/include/linux/bpf.h > \+++ b/include/linux/bpf.h > \@@ -1926,7 +1926,7 @@ struct bpf_array { > }; > }; > > -#define BPF_COMPLEXITY_LIMIT_INSNS 1000000 /* yes. 1M insns */ > +#define BPF_COMPLEXITY_LIMIT_INSNS 256 /* yes. 1M insns */ > #define MAX_TAIL_CALL_CNT 33 > > /* Maximum number of loops for bpf_loop and bpf_iter_num. > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index f514247ba8ba..75e88be3bb3e 100644 > \--- a/kernel/bpf/verifier.c > \+++ b/kernel/bpf/verifier.c > \@@ -18024,8 +18024,13 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) > skip_inf_loop_check: > if (!force_new_state && > env->jmps_processed - env->prev_jmps_processed < 20 && > - env->insn_processed - env->prev_insn_processed < 100) > + env->insn_processed - env->prev_insn_processed < 100) { > + verbose(env, "is_state_visited: suppressing checkpoint at %d, %d jmps processed, cur->jmp_history_cnt is %d\n", > + env->insn_idx, > + env->jmps_processed - env->prev_jmps_processed, > + cur->jmp_history_cnt); > add_new_state = false; > + } > goto miss; > } > /* If sl->state is a part of a loop and this loop's entry is a part of > \@@ -18142,6 +18147,9 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) > if (!add_new_state) > return 0; > > + verbose(env, "is_state_visited: new checkpoint at %d, resetting env->jmps_processed\n", > + env->insn_idx); > + > /* There were no equivalent states, remember the current one. > * Technically the current state is not proven to be safe yet, > * but it will either reach outer most bpf_exit (which means it's safe) > > And observe verification log: > > ... > is_state_visited: new checkpoint at 5, resetting env->jmps_processed > 5: R1=ctx() R7=ctx(...) > 5: (65) if r7 s> 0x37d2 goto pc+0 ; R7=ctx(...) > 6: (b7) r0 = 0 ; R0_w=0 > 7: (95) exit > > from 5 to 6: R1=ctx() R7=ctx(...) R10=fp0 > 6: R1=ctx() R7=ctx(...) R10=fp0 > 6: (b7) r0 = 0 ; R0_w=0 > 7: (95) exit > is_state_visited: suppressing checkpoint at 1, 3 jmps processed, cur->jmp_history_cnt is 74 > > from 2 to 1: R1=ctx() R7_w=scalar(...) R10=fp0 > 1: R1=ctx() R7_w=scalar(...) R10=fp0 > 1: (07) r7 += 447767737 > is_state_visited: suppressing checkpoint at 2, 3 jmps processed, cur->jmp_history_cnt is 75 > 2: R7_w=scalar(...) > 2: (45) if r7 & 0x702000 goto pc-2 > ... mark_precise 152 steps for r7 ... > 2: R7_w=scalar(...) > is_state_visited: suppressing checkpoint at 1, 4 jmps processed, cur->jmp_history_cnt is 75 > 1: (07) r7 += 447767737 > is_state_visited: suppressing checkpoint at 2, 4 jmps processed, cur->jmp_history_cnt is 76 > 2: R7_w=scalar(...) > 2: (45) if r7 & 0x702000 goto pc-2 > ... > BPF program is too large. Processed 257 insn > > The log output shows that checkpoint at label (1) is never created, > because it is suppressed by `skip_inf_loop_check` logic: > a. When 'if' at (2) is processed it pushes a state with insn_idx (1) > onto stack and proceeds to (3); > b. At (5) checkpoint is created, and this resets > env->{jmps,insns}_processed. > c. Verification proceeds and reaches `exit`; > d. State saved at step (a) is popped from stack and is_state_visited() > considers if checkpoint needs to be added, but because > env->{jmps,insns}_processed had been just reset at step (b) > the `skip_inf_loop_check` logic forces `add_new_state` to false. > e. Verifier proceeds with current state, which slowly accumulates > more and more entries in the jump history. I'm still not sure why it grew to thousands of entries in jmp_history. Looking at the above trace jmps_processed grows 1 to 1 with jmp_history_cnt. Also cur->jmp_history_cnt is reset to zero at the same time as jmps processed. So in the above test 75 vs 4 difference came from jmp_history entries that were there before the loop ?