On Wed, Jun 19, 2019 at 05:24:32PM -0700, John Fastabend wrote: > Alexei Starovoitov wrote: > > Compilers often spill induction variables into the stack, > > hence it is necessary for the verifier to track scalar values > > of the registers through stack slots. > > > > Also few bpf programs were incorrectly rejected in the past, > > since the verifier was not able to track such constants while > > they were used to compute offsets into packet headers. > > > > Tracking constants through the stack significantly decreases > > the chances of state pruning, since two different constants > > are considered to be different by state equivalency. > > End result that cilium tests suffer serious degradation in the number > > of states processed and corresponding verification time increase. > > > > before after > > bpf_lb-DLB_L3.o 1838 6441 > > bpf_lb-DLB_L4.o 3218 5908 > > bpf_lb-DUNKNOWN.o 1064 1064 > > bpf_lxc-DDROP_ALL.o 26935 93790 > > bpf_lxc-DUNKNOWN.o 34439 123886 > > bpf_netdev.o 9721 31413 > > bpf_overlay.o 6184 18561 > > bpf_lxc_jit.o 39389 359445 > > > > After further debugging turned out that cillium progs are > > getting hurt by clang due to the same constant tracking issue. > > Newer clang generates better code by spilling less to the stack. > > Instead it keeps more constants in the registers which > > hurts state pruning since the verifier already tracks constants > > in the registers: > > old clang new clang > > (no spill/fill tracking introduced by this patch) > > bpf_lb-DLB_L3.o 1838 1923 > > bpf_lb-DLB_L4.o 3218 3077 > > bpf_lb-DUNKNOWN.o 1064 1062 > > bpf_lxc-DDROP_ALL.o 26935 166729 > > bpf_lxc-DUNKNOWN.o 34439 174607 > ^^^^^^^^^^^^^^ > Any idea what happened here? Going from 34439 -> 174607 on the new clang? As I was alluding in commit log newer clang is smarter and generates less spill/fill of constants. In particular older clang loads two constants into r8 and r9 and immediately spills them into stack. Then fills later, does a bunch of unrelated code and calls into helper that has ARG_ANYTHING for that position. Then doing a bit more math on filled constants, spills them again and so on. Before this patch (that tracks spill/fill of constants into stack) pruning points were equivalent, but with the patch it sees the difference in registers and declares states not equivalent, though any constant is fine from safety standpoint. With new clang only r9 has this pattern of spill/fill. New clang manages to keep constant in r8 to be around without spill/fill. Existing verifier tracks constants so even without this patch the same pathalogical behavior is observed. The verifier need to walk a lot more instructions only because r8 has different constants. > > bpf_netdev.o 9721 8407 > > bpf_overlay.o 6184 5420 > > bpf_lcx_jit.o 39389 39389 > > > > The final table is depressing: > > old clang old clang new clang new clang > > const spill/fill const spill/fill > > bpf_lb-DLB_L3.o 1838 6441 1923 8128 > > bpf_lb-DLB_L4.o 3218 5908 3077 6707 > > bpf_lb-DUNKNOWN.o 1064 1064 1062 1062 > > bpf_lxc-DDROP_ALL.o 26935 93790 166729 380712 > > bpf_lxc-DUNKNOWN.o 34439 123886 174607 440652 > > bpf_netdev.o 9721 31413 8407 31904 > > bpf_overlay.o 6184 18561 5420 23569 > > bpf_lxc_jit.o 39389 359445 39389 359445 > > > > Tracking constants in the registers hurts state pruning already. > > Adding tracking of constants through stack hurts pruning even more. > > The later patch address this general constant tracking issue > > with coarse/precise logic. > > > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > > Acked-by: Andrii Nakryiko <andriin@xxxxxx> > > --- > > kernel/bpf/verifier.c | 90 +++++++++++++++++++++++++++++++------------ > > 1 file changed, 65 insertions(+), 25 deletions(-) > > I know these are already in bpf-next sorry it took me awhile to get > time to review, but looks good to me. Thanks! We had something similar > in the earlier loop test branch from last year. It's not in bpf-next yet :) Code reviews are appreciated at any time. Looks like we were just lucky with older clang. I haven't tracked which clang version became smarter. If you haven't seen this issue and haven't changed cilium C source to workaround that then there is chance you'll hit it as well. By "new clang" I meant version 9.0 "old clang" is unknown. I just had cilium elf .o around that I kept using for testing without recompiling them. Just by chance I recompiled them to see annotated verifier line info messages with BTF and hit this interesting issue. See patch 9 backtracking logic that resolves this 'precision of scalar' issue for progs compiled with both new and old clangs.