On Wed, Oct 9, 2024 at 12:41 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Tue, 2024-10-08 at 19:12 -0700, Eduard Zingerman wrote: > > In [1] syzbot reported an interesting BPF program. > > Verification for this program takes a very long time. > > > > [1] https://lore.kernel.org/bpf/670429f6.050a0220.49194.0517.GAE@xxxxxxxxxx/ > > > > The program could be simplified to the following snippet: > > > > /* Program type is kprobe */ > > r7 = *(u16 *)(r1 +0); > > 1: r7 += 0x1ab064b9; > > if r7 & 0x702000 goto 1b; > > r7 &= 0x1ee60e; > > r7 += r1; > > if r7 s> 0x37d2 goto +0; > > r0 = 0; > > exit; > > Answering a few questions from off-list discussion with Alexei. > The test is not specific for jset instruction, e.g. the following > program exhibits similar behaviour: > > SEC("kprobe") > __failure __log_level(4) > __msg("BPF program is too large.") > __naked void short_loop1(void) > { > asm volatile ( > " r7 = *(u16 *)(r1 +0);" > " r8 = *(u64 *)(r1 +16);" > "1: r7 += 0x1ab064b9;" > "if r7 < r8 goto 1b;" > " r7 &= 0x1ee60e;" > " r7 += r1;" > " if r7 s> 0x37d2 goto +0;" > " r0 = 0;" > " exit;" > ::: __clobber_all); > } > > > The snippet exhibits the following behaviour depending on > > BPF_COMPLEXITY_LIMIT_INSNS: > > - at 1,000,000 verification does not finish in 15 minutes; > > - at 100,000 verification finishes in 15 seconds; > > - at 100 it is possible to get some verifier log. > > Still investigating why running time change is non-linear. > > [...] > > > This patch forcibly enables checkpoints for each loop back-edge. > > This helps with the programs in question, as verification of both > > syzbot program and reduced snippet finishes in ~2.5 sec. > > There is the following code in is_state_visited(): > > ... > /* if the verifier is processing a loop, avoid adding new state > * too often, since different loop iterations have distinct > * states and may not help future pruning. > * This threshold shouldn't be too low to make sure that > * a loop with large bound will be rejected quickly. > * The most abusive loop will be: > * r1 += 1 > * if r1 < 1000000 goto pc-2 > * 1M insn_procssed limit / 100 == 10k peak states. > * This threshold shouldn't be too high either, since states > * at the end of the loop are likely to be useful in pruning. > */ > skip_inf_loop_check: > if (!env->test_state_freq && > env->jmps_processed - env->prev_jmps_processed < 20 && > env->insn_processed - env->prev_insn_processed < 100) > add_new_state = false; > goto miss; > ... > > Which runs into a direct contradiction with what I do in this patch, > so either I need to change the patch or this fragment needs adjustment. Yeah. It's not saving all states exactly to avoid: loop3.bpf.o while_true +350359 (+106.88%) +0 (+0.00%) +101448 (+1049.86%) 100k states is a lot of memory. It might cause OOMs. pyperf600 improvement is certainly nice though. It feels it's better to adjust above heuristic 20 && 100 instead of forcing save state everywhere. Something should be done about: 71.25% [k] __mark_chain_precision 24.81% [k] bt_sync_linked_regs as well. The algorithm there needs some tweaks. pw-bot: cr