This RFC consists of a bug fix and two improvements for iterator/callback based loops handling: - Patch #1 fixes a bug in copy_verifier_state(), where field 'loop_entry' was not copied. The fix has negative impact on verification performance. - Patch #3 mitigates negative impact from patch #1 by avoiding clean_live_states() for states that have loop_entry that is still being verified. This reduces amount of processed instructions for sched_ext by 28-92% for some programs. - Patches #5,6 introduce a simple live registers DFA analysis. Results of this analysis are used in func_states_equal() and further reduce amount of processed instructions for sched_ext by 17-30% for some programs. Below are veristat results comparing master to this RFC. selftests: File Program Insns (DIFF) States (DIFF) --------------------- -------------------------- ----------------- ---------------- arena_htab.bpf.o arena_htab_llvm -294 (-41.00%) -20 (-35.09%) arena_htab_asm.bpf.o arena_htab_asm -152 (-25.46%) -10 (-21.28%) arena_list.bpf.o arena_list_add +329 (+22.04%) +7 (+23.33%) arena_list.bpf.o arena_list_del -161 (-52.10%) -12 (-52.17%) iters.bpf.o checkpoint_states_deletion -16914 (-93.32%) -778 (-95.11%) iters.bpf.o iter_nested_deeply_iters -293 (-49.41%) -30 (-44.78%) iters.bpf.o iter_nested_iters -181 (-22.26%) -17 (-21.52%) iters.bpf.o iter_subprog_iters -430 (-39.31%) -29 (-32.95%) iters.bpf.o loop_state_deps2 -158 (-32.99%) -14 (-30.43%) pyperf600_iter.bpf.o on_event -8633 (-69.97%) -189 (-42.86%) sched_ext: Program Insns (DIFF) States (DIFF) ---------------------- ----------------- ---------------- lavd_dispatch -34018 (-22.00%) -1885 (-21.06%) layered_dispatch -5378 (-46.81%) -313 (-36.91%) layered_dump -3689 (-49.71%) -455 (-66.81%) layered_enqueue -4038 (-30.90%) -351 (-29.80%) layered_init -995483 (-99.55%) -84233 (-99.48%) layered_runnable -1587 (-49.32%) -172 (-58.31%) refresh_layer_cpumasks -15597 (-94.60%) -1684 (-95.14%) rustland_init -85 (-17.07%) -7 (-17.07%) rustland_init -85 (-17.07%) -7 (-17.07%) rusty_select_cpu -706 (-33.65%) -55 (-30.39%) rusty_set_cpumask -15934 (-78.67%) -1359 (-81.62%) tp_cgroup_attach_task -57 (-27.67%) -4 (-22.22%) central_dispatch -87 (-13.68%) -9 (-14.29%) central_init -476 (-52.19%) -12 (-25.00%) pair_dispatch -998423 (-99.84%) -58264 (-99.78%) qmap_dispatch -581 (-24.28%) -48 (-24.49%) userland_dispatch -36 (-22.78%) -4 (-23.53%) ('layered_init' and 'pair_dispatch' hit 1M on master, but are verified ok with this patch-set). sched_ext used for testing: commit 2b7f3bba928f ("Merge pull request #1197 from devnexen/code_simpl4") Eduard Zingerman (7): bpf: copy_verifier_state() should copy 'loop_entry' field selftests/bpf: test correct loop_entry update in copy_verifier_state bpf: don't do clean_live_states when state->loop_entry->branches > 0 selftests/bpf: check states pruning for deeply nested iterator bpf: DFA-based liveness analysis for program registers bpf: use register liveness information for func_states_equal selftests/bpf: test cases for compute_live_registers() include/linux/bpf_verifier.h | 6 + kernel/bpf/verifier.c | 372 ++++++++++++++++-- .../testing/selftests/bpf/prog_tests/align.c | 11 +- .../bpf/prog_tests/compute_live_registers.c | 9 + tools/testing/selftests/bpf/progs/bpf_misc.h | 12 + .../bpf/progs/compute_live_registers.c | 363 +++++++++++++++++ tools/testing/selftests/bpf/progs/iters.c | 139 +++++++ .../selftests/bpf/progs/verifier_gotol.c | 6 +- .../bpf/progs/verifier_iterating_callbacks.c | 6 +- 9 files changed, 886 insertions(+), 38 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/compute_live_registers.c create mode 100644 tools/testing/selftests/bpf/progs/compute_live_registers.c -- 2.47.1