Re: [PATCH bpf-next v2 0/7] exact states comparison for iterator convergence checks

Eduard Zingerman <eddyz87@xxxxxxxxx> · Tue, 24 Oct 2023 00:40:48 +0300

On Mon, 2023-10-23 at 20:17 +0300, Eduard Zingerman wrote:
> On Sun, 2023-10-22 at 04:08 +0300, Eduard Zingerman wrote:
> [...]
> > Changelog:
> > V1 -> V2 [2], applied changes suggested by Alexei offlist:
> > - __explored_state() function removed;
> > - same_callsites() function is now used in clean_live_states();
> > - patches #1,2 are added as preparatory code movement;
> > - in process_iter_next_call() a safeguard is added to verify that
> >   cur_st->parent exists and has expected insn index / call sites.
> 
> I have V3 ready and passing CI.
> 
> However I checked on Alexei's concerns regarding performance on
> explored states cache miss and verifier does not behave well with this
> patch-set. For example, the program at the end of the email causes
> verifier to "hang" (loop inside is_state_visited() to no end).
> 
> There are several options to fix this:
> (a) limit total iteration depth, as in [1], the limit would have to be
>     at-least 1000 to make iters/task_vma pass;
> (b) limit maximal number of checkpoint states associated with
>     instruction and evict those with lowest dfs_depth;
> (c) choose bigger constants in `sl->miss_cnt > sl->hit_cnt * 3 + 3` for
>     checkpoint states.

I played a bit with constants in 'eviction on miss' formula using [1] (option c).
There are three relevant tests:
- iters/max_iter_depth: should report load failure within a reasonable time;
- iters/checkpoint_states_deletion: should pass;
- verif_scale_pyperf600_iter: should pass.

I think iters/checkpoint_states_deletion represents the worst case scenario,
because depending on number of variables N, it produces 2**N distinct states.
The formula for eviction that does not loose relevant states is:

    sl->miss_cnt > sl->hit_cnt * 2**N + 2**N

(because states start to repeat after 2**N iterations).

W/o eviction for checkpoint states maximal number of variables
verifier could handle in this test is 9, with reported 958,883 insns processed.
Which corresponds to formula (sl->miss_cnt > sl->hit_cnt * 512 + 512).

Using these values I get the following execution times:

| test                       | time ./test_progs -a <test> |
|----------------------------+-----------------------------|
| verif_scale_pyperf600_iter |                        0.2s |
| checkpoint_states_deletion |                        5.8s |
| max_iter_depth             |                       23.9s |

Going one step lower to 8 variables (and 256 as a constant),
checkpoint_states_deletion takes 248,133 insns to complete
and timings table looks as follows:

| test                       | time ./test_progs -a <test> |
|----------------------------+-----------------------------|
| verif_scale_pyperf600_iter |                        0.2s |
| checkpoint_states_deletion |                        1.0s |
| max_iter_depth             |                       15.2s |

So, it is possible to get speedup for worst case scenario by leaving
some instruction budget on the table.

IMO using formula (sl->miss_cnt > sl->hit_cnt * 512 + 512) to evict
checkpoints kind-off sort-off makes sense but is very hacky.
(Or 256).

I think that better solution would be to go for option (b) from a
previous email:
- evict old checkpoints basing on dfs_depth;
- also use a secondary hash-table for checkpoints and hash not only
  insn_idx but also some fingerprint of register states, thus avoiding
  long state list walks.

But that would require some additional coding and I know that Alexei
wants to land this thing sooner than later.

[1] https://github.com/eddyz87/bpf/tree/iter-exact-eviction-formula-experiments