Re: [PATCH bpf-next v1 1/2] bpf: force checkpoints at loop back-edges

Eduard Zingerman <eddyz87@xxxxxxxxx> · Thu, 17 Oct 2024 19:17:24 -0700

On Thu, 2024-10-10 at 16:23 -0700, Alexei Starovoitov wrote:
> On Thu, Oct 10, 2024 at 3:52 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> > 
> > On Thu, 2024-10-10 at 15:40 -0700, Eduard Zingerman wrote:
> > 
> > > I think this would bring significant speedup.
> > > Not sure it would completely fix the issue at hand,
> > > as mark_chain_precision() walks like 100 instructions back on each
> > > iteration of the loop, but it might be a step in the right direction.
> > 
> > In theory, we can do mark_chain_precision() lazily:
> > - mark that certain registers need precision for an instruction;
> > - wait till next checkpoint is created;
> > - do the walk for marked registers.
> > 
> > This should be simpler to implement on top of what Andrii suggests.
> 
> Indeed it can wait until the next new_state, since
> when the next iteration of the loop starts the sl->state
> are guaranteed to be some olderstates. Though the verifier may have done
> multiple loop iteration traversals, but it's still in 'cur' state,
> so states_equal(env, &sl->state, cur...) should still be correct.

After spending absolutely unreasonable amount of time on this I
finally identified the missing piece of the puzzle.

The example program takes so much time to verify because it
accumulates a huge jmp_history array (262145 entries)
within a single state and constantly executes mark_chain_precision()
against it.

The reason for such huge jmp_history is an interplay between two
conditions:

	if (env->jmps_processed - env->prev_jmps_processed >= 2 &&
	    env->insn_processed - env->prev_insn_processed >= 8)
		add_new_state = true;

and

	if (!force_new_state &&
	    env->jmps_processed - env->prev_jmps_processed < 20 &&
	    env->insn_processed - env->prev_insn_processed < 100)
		add_new_state = false;

I try to explain it in the commit message for [1].

If one waits long enough, the verifier eventually finishes with
-ENOMEM, as it fails to allocate big enough jmp_history array
(one has to wait 15 minutes on master, and about a minute with my patch [2],
 the patch has not changed since our discussion on Wednesday,
 the error message is the same as we discussed on Wednesday).

The discussed change to lazy precision marks propagation turned out to
be a dud:
- it does not solve the issue at hand, verifier still exits with
  ENOMEM, just faster;
- overall verification speedup for regular programs is not big:

$ ./veristat -e file,prog,duration -f 'duration>100000' -C master-baseline.log current.log 
File                      Program           Duration (us) (A)  Duration (us) (B)  Duration (us) (DIFF)
------------------------  ----------------  -----------------  -----------------  --------------------
loop1.bpf.o               nested_loops                 124334             117382        -6952 (-5.59%)
loop3.bpf.o               while_true                   318165             253264      -64901 (-20.40%)
pyperf100.bpf.o           on_event                     704852             692300       -12552 (-1.78%)
pyperf180.bpf.o           on_event                    2304115            2251502       -52613 (-2.28%)
pyperf50.bpf.o            on_event                     183971             184124         +153 (+0.08%)
pyperf600.bpf.o           on_event                    2076073            2051733       -24340 (-1.17%)
pyperf600_nounroll.bpf.o  on_event                     394093             402552        +8459 (+2.15%)
strobemeta.bpf.o          on_event                     169218             169835         +617 (+0.36%)
test_verif_scale1.bpf.o   balancer_ingress             144779             146486        +1707 (+1.18%)
test_verif_scale2.bpf.o   balancer_ingress             147078             151902        +4824 (+3.28%)
test_verif_scale3.bpf.o   balancer_ingress             184959             188723        +3764 (+2.04%)

Here only loop3 demonstrates significant speedup (re-run for loop3 several times),
but it does not warrant the code change in my opinion.

[1] https://lore.kernel.org/bpf/20241018020307.1766906-1-eddyz87@xxxxxxxxx/
[2] https://github.com/eddyz87/bpf/tree/lazy-mark-chain-precision