On Fri, Mar 29, 2019 at 08:12:39PM -0700, Jakub Kicinski wrote: > On Fri, 29 Mar 2019 17:16:07 -0700, Alexei Starovoitov wrote: > > Branch instructions, branch targets and calls in a bpf program are > > the places where the verifier remembers states that led to successful > > verification of the program. > > These states are used to prune brute force program analysis. > > For unprivileged programs there is a limit of 64 states per such > > 'branching' instructions (maximum length is tracked by max_states_per_insn > > counter introduced in the previous patch). > > Simply reducing this threshold to 32 or lower increases insn_processed > > metric to the point that small valid programs get rejected. > > For root programs there is no limit and cilium programs can have > > max_states_per_insn to be 100 or higher. > > Walking 100+ states multiplied by number of 'branching' insns during > > verification consumes significant amount of cpu time. > > Turned out simple LRU-like mechanism can be used to remove states > > that unlikely will be helpful in future search pruning. > > This patch introduces hit_cnt and miss_cnt counters: > > hit_cnt - this many times this state successfully pruned the search > > miss_cnt - this many times this state was not equivalent to other states > > (and that other states were added to state list) > > > > The heuristic introduced in this patch is: > > if (sl->miss_cnt > sl->hit_cnt * 3 + 3) > > /* drop this state from future considerations */ > > > > Higher numbers increase max_states_per_insn (allow more states to be > > considered for pruning) and slow verification speed, but do not meaningfully > > reduce insn_processed metric. > > Lower numbers drop too many states and insn_processed increases too much. > > Many different formulas were considered. > > This one is simple and works well enough in practice. > > (the analysis was done on selftests/progs/* and on cilium programs) > > > > The end result is this heuristic improves verification speed by 10 times. > > Large synthetic programs that used to take a second more now take > > 1/10 of a second. > > In cases where max_states_per_insn used to be 100 or more, now it's ~10. > > > > There is a slight increase in insn_processed for cilium progs: > > before after > > bpf_lb-DLB_L3.o 1831 1838 > > bpf_lb-DLB_L4.o 3029 3218 > > bpf_lb-DUNKNOWN.o 1064 1064 > > bpf_lxc-DDROP_ALL.o 26309 26935 > > bpf_lxc-DUNKNOWN.o 33517 34439 > > bpf_netdev.o 9713 9721 > > bpf_overlay.o 6184 6184 > > bpf_lcx_jit.o 37335 39389 > > And 2-3 times improvement in the verification speed. > > > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > > Reviewed-by: Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx> > > > @@ -6182,8 +6185,35 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) > > return err; > > return 1; > > } > > - sl = sl->next; > > states_cnt++; > > + sl->miss_cnt++; > > + /* heuristic to determine whether this state is beneficial > > + * to keep checking from state equivalence point of view. > > + * Higher numbers increase max_states_per_insn and verification time, > > + * but do not meaningfully decrease insn_processed. > > + */ > > + if (sl->miss_cnt > sl->hit_cnt * 3 + 3) { > > + /* the state is unlikely to be useful. Remove it to > > + * speed up verification > > + */ > > + *pprev = sl->next; > > + if (sl->state.frame[0]->regs[0].live & REG_LIVE_DONE) { > > + free_verifier_state(&sl->state, false); > > + kfree(sl); > > + env->peak_states--; > > nit: is peak_states always equal to number of states when verifier > exits? yes. I was thinking as a follow up to add peak_states-- in bpf_verifier_env free-ing logic to check that there are no memory leaks. Few other follow ups on my todo list: . write a ton more tests for scalability . remove few leftover global variables and drop mutex for root . account all verifier memory in memcg . may be introduce a limit for peak_states for unpriv . continue exploring state merging ideas . introduce explicit rcu_unlock + preempt_enable in the programs