On Mon, 2024-10-21 at 22:38 -0700, Eduard Zingerman wrote: [...] > This takes ~10 minutes to verify on master. > Surprisingly current patch does not seem to help, > I'll investigate this tomorrow. > Full example is in the end of the email. I messed up the example a little bit. The example shared previously takes so long to process because of "goto +0;". opt_remove_nops() deletes such jumps and we know that bpf_remove_insns() is not efficient. Corrected example uses conditional jump in place of "goto +0;" and slightly adjusted counters. Full program is here: https://gist.github.com/eddyz87/cb813387323b78bcd6a7e264fc44c817 Here is it's verification log to get the idea: 0: (79) r2 = *(u64 *)(r1 +0) 1: (b7) r0 = 0 2: (35) if r2 >= 0x1 goto pc+5 push_stack: at 2, jmp_history_cnt 0 3: (35) if r0 >= 0x0 goto pc+0 4: (35) if r0 >= 0x0 goto pc+0 5: (35) if r0 >= 0x0 goto pc+0 6: (35) if r0 >= 0x0 goto pc+0 is_state_visited: new checkpoint at 7, resetting env->jmps_processed 7: (95) exit 8: (35) if r2 >= 0x2 goto pc+7 push_stack: at 8, jmp_history_cnt 1 9: (35) if r0 >= 0x0 goto pc+0 10: (35) if r0 >= 0x0 goto pc+0 11: (35) if r0 >= 0x0 goto pc+0 12: (35) if r0 >= 0x0 goto pc+0 13: (35) if r0 >= 0x0 goto pc+0 14: (35) if r0 >= 0x0 goto pc+0 is_state_visited: new checkpoint at 15, resetting env->jmps_processed 15: (95) exit ... 320: (35) if r2 >= 0x29 goto pc+7 push_stack: at 320, jmp_history_cnt 40 320: R2_w=40 321: (35) if r0 >= 0x0 goto pc+0 322: (35) if r0 >= 0x0 goto pc+0 323: (35) if r0 >= 0x0 goto pc+0 324: (35) if r0 >= 0x0 goto pc+0 325: (35) if r0 >= 0x0 goto pc+0 326: (35) if r0 >= 0x0 goto pc+0 is_state_visited: new checkpoint at 327, resetting env->jmps_processed 327: (95) exit ... A bpf program w/o loops, at each 'if r2 >= ...' push_stack() saves a state with ever increasing jump history. - right amount of 'if r0 >= ...' instructions is maintained before 'exit' to force a new checkpoint; - exit is processed; - state is popped from the stack and first insn it processes is 'if r2 >= ...', thus a new state is saved by push_stack() with jump history longer by 1. On master this fails with ENOMEM and the following error in the log: [ 418.083600] test_progs: page allocation failure: order:7, mode:0x140cc0(GFP_USER|__GFP_COMP), \ nodemask=(null),cpuset=/,mems_allowed=0 ... [ 418.084158] Call Trace: ... [ 418.084649] krealloc_noprof+0x53/0xd0 [ 418.084688] copy_verifier_state+0x78/0x390 ... Same happens if jmp_history_cnt check is moved to 'skip_inf_loop_check': --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -18022,7 +18022,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) * at the end of the loop are likely to be useful in pruning. */ skip_inf_loop_check: - if (!force_new_state && + if (!force_new_state && cur->jmp_history_cnt < 40 && env->jmps_processed - env->prev_jmps_processed < 20 && env->insn_processed - env->prev_insn_processed < 100) Or if it is in the else branch. Simply because 'skip_inf_loop_check' is for instructions that have been already seen on the current verification path. However, with change suggested in this patch-set such ENOMEM situation is not possible. Hence I insist that large enough jmp_history_cnt should force a new state, and point where I put the check covers more cases then alternatives.