On Mon, Oct 21, 2024 at 7:27 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Mon, 2024-10-21 at 19:18 -0700, Alexei Starovoitov wrote: > > [...] > > > > 0: r7 = *(u16 *)(r1 +0);" > > > 1: r7 += 0x1ab064b9;" > > > 2: if r7 & 0x702000 goto 1b; > > > 3: r7 &= 0x1ee60e;" > > > 4: r7 += r1;" > > > 5: if r7 s> 0x37d2 goto +0;" > > > 6: r0 = 0;" > > > 7: exit;" > > [...] > > > > And observe verification log: > > > > > > ... > > > is_state_visited: new checkpoint at 5, resetting env->jmps_processed > > > 5: R1=ctx() R7=ctx(...) > > > 5: (65) if r7 s> 0x37d2 goto pc+0 ; R7=ctx(...) > > > 6: (b7) r0 = 0 ; R0_w=0 > > > 7: (95) exit > > > > > > from 5 to 6: R1=ctx() R7=ctx(...) R10=fp0 > > > 6: R1=ctx() R7=ctx(...) R10=fp0 > > > 6: (b7) r0 = 0 ; R0_w=0 > > > 7: (95) exit > > > is_state_visited: suppressing checkpoint at 1, 3 jmps processed, cur->jmp_history_cnt is 74 > > > > > > from 2 to 1: R1=ctx() R7_w=scalar(...) R10=fp0 > > > 1: R1=ctx() R7_w=scalar(...) R10=fp0 > > > 1: (07) r7 += 447767737 > > > is_state_visited: suppressing checkpoint at 2, 3 jmps processed, cur->jmp_history_cnt is 75 > > > 2: R7_w=scalar(...) > > > 2: (45) if r7 & 0x702000 goto pc-2 > > > ... mark_precise 152 steps for r7 ... > > > 2: R7_w=scalar(...) > > > is_state_visited: suppressing checkpoint at 1, 4 jmps processed, cur->jmp_history_cnt is 75 > > > 1: (07) r7 += 447767737 > > > is_state_visited: suppressing checkpoint at 2, 4 jmps processed, cur->jmp_history_cnt is 76 > > > 2: R7_w=scalar(...) > > > 2: (45) if r7 & 0x702000 goto pc-2 > > > ... > > > BPF program is too large. Processed 257 insn > > > > > > The log output shows that checkpoint at label (1) is never created, > > > because it is suppressed by `skip_inf_loop_check` logic: > > > a. When 'if' at (2) is processed it pushes a state with insn_idx (1) > > > onto stack and proceeds to (3); > > > b. At (5) checkpoint is created, and this resets > > > env->{jmps,insns}_processed. > > > c. Verification proceeds and reaches `exit`; > > > d. State saved at step (a) is popped from stack and is_state_visited() > > > considers if checkpoint needs to be added, but because > > > env->{jmps,insns}_processed had been just reset at step (b) > > > the `skip_inf_loop_check` logic forces `add_new_state` to false. > > > e. Verifier proceeds with current state, which slowly accumulates > > > more and more entries in the jump history. > > > > I'm still not sure why it grew to thousands of entries in jmp_history. > > Looking at the above trace jmps_processed grows 1 to 1 with jmp_history_cnt. > > Also cur->jmp_history_cnt is reset to zero at the same time as > > jmps processed. > > So in the above test 75 vs 4 difference came from jmp_history > > entries that were there before the loop ? > > 0: r7 = *(u16 *)(r1 +0);" > 1: r7 += 0x1ab064b9;" > 2: if r7 & 0x702000 goto 1b; > 3: r7 &= 0x1ee60e;" > 4: r7 += r1;" > 5: if r7 s> 0x37d2 goto +0;" > 6: r0 = 0;" > 7: exit;" > > - When 'if' at (2) is processed current state is copied (let's call > this copy C), copy is put to the stack for later processing, > it's jump history is not cleared. > - Then current state proceeds verifying 3-5-6-7. At (5) checkpoint is > created and env->{jmps,insns}_processed are reset. > - Then state C is popped from the stack, it goes back to (1) and then (2), > at (2) a copy C1 is created but no checkpoint, as env->{jmps,insns}_processed > do not meet thresholds. C1's jmp_history is one entry longer then C's. > - Whole thing repeats until ENOMEM. I see. Thanks for explaining. So the bug is actually in reset logic of jmps/insns_processed coupled with push/pop stack. I think let's add cur->jmp_history_cnt < 40 check for now and target bpf tree (assuming no veristat regressions), but for bpf-next we probably need to follow up. We can probably remove jmps_processed counter and replace it with jmp_history_cnt. Based on the above explanation insns_processed counter is also bogus.