Re: Verification speed w/ KASAN builds

Lorenz Bauer <lmb@xxxxxxxxxxxxxx> · Tue, 4 Aug 2020 12:13:44 +0100

On Mon, 20 Jul 2020 at 21:37, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Fri, Jul 17, 2020 at 11:46:37AM +0100, Lorenz Bauer wrote:
> > Hi list,
> >
> > I'm not sure whether this is a bug report or just the way of life.
> > The problem: we have a couple of machines that run KASAN
> > kernels to weed out bugs. On those machines, loading our
> > cls-redirect TC classifier takes so long that our user space
> > program aborts.
> >
> > I've reproduced this in a VM: loading cls-redirect on a VM
> > with a 5.4 kernel without KASAN takes around 4 seconds.
> > Doing the same on recent bpf-next with KASAN and other
> > shenanigans enabled it takes more like a minute.
>
> a minute to load single program? that sounds high.
> While processing patches I build the kernel with kasan and lockdep.
> None of test_progs and test_verifier programs have such drastic
> slowdowns. I'm not sure what is the reason.

Sorry for the long delay, I was on holiday.

> do you use kasan inline or outline ?

CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
# CONFIG_KASAN_OUTLINE is not set
CONFIG_KASAN_INLINE=y
CONFIG_KASAN_STACK=1
# CONFIG_KASAN_VMALLOC is not set
# CONFIG_TEST_KASAN is not set

Looks like the inline version.

>
> > Is it expected that the overhead of KASAN is this large?
>
> sounds like 20x overhead ? I think 10x is normal.
>
> > I went and collected a perf profile of loading the program
> > in the VM:
> >
> > -   96.31%     1.00%  redirect.test  [kernel.kallsyms]  [k] do_check_common
> >    - 95.32% do_check_common
> >       - 69.24% states_equal.isra.0
> >          + 49.81% kmem_cache_alloc_trace
> >          + 16.77% kfree
> >          + 1.22% regsafe.part.0
> >       - 12.75% push_stack
> >          - 10.65% copy_verifier_state
> >             - 4.50% realloc_stack_state
> >                + 4.48% __kmalloc
> >             + 4.16% kmem_cache_alloc_trace
> >             + 1.82% __kmalloc
> >          + 2.07% kmem_cache_alloc_trace
> >       + 5.25% pop_stack
> >       + 2.84% push_jmp_history.isra.0
> >       + 2.46% copy_verifier_state
> >       + 1.00% free_verifier_state
> >         0.53% kmem_cache_alloc_trace
> >    + 1.00% runtime.goexit
>
> the perf profile makes sense.
> how many insn it processed?

On a fresh bpf-next build in the VM:

    TestLoadProgram: redirect.go:76: cls_redirect load time 1m11.886066762s
    TestLoadProgram: redirect.go:77: 25735 insns (limit 1000000)
max_states_per_insn 28 total_states 26728 peak_states 1362 mark_read
26

On my ubuntu 5.4.0-40-generic #44-Ubuntu (not VM!):

    TestLoadProgram: redirect.go:76: cls_redirect load time 3.025380047s
    TestLoadProgram: redirect.go:77: 25735 insns (limit 1000000)
max_states_per_insn 28 total_states 26728 peak_states 1362 mark_read
26

> what are test_progs -s stats ?

#7/1 loop3.o:OK
verification time 2215681 usec
stack depth 8+0
processed 554754 insns (limit 1000000) max_states_per_insn 18
total_states 8636 peak_states 2141 mark_read 3
#7/2 test_verif_scale1.o:OK
verification time 4037383 usec
stack depth 8
processed 773445 insns (limit 1000000) max_states_per_insn 13
total_states 3048 peak_states 788 mark_read 1
#7/3 test_verif_scale2.o:OK
verification time 2132131 usec
stack depth 8+0
processed 845499 insns (limit 1000000) max_states_per_insn 18
total_states 8636 peak_states 2141 mark_read 3
#7/4 test_verif_scale3.o:OK
verification time 95848 usec
stack depth 0+352
processed 6102 insns (limit 1000000) max_states_per_insn 1
total_states 422 peak_states 422 mark_read 111
#7/5 pyperf_global.o:OK
verification time 1308885 usec
stack depth 352
processed 46378 insns (limit 1000000) max_states_per_insn 5
total_states 3263 peak_states 3241 mark_read 113
#7/6 pyperf50.o:OK
verification time 5858365 usec
stack depth 352
processed 99548 insns (limit 1000000) max_states_per_insn 5
total_states 6909 peak_states 6883 mark_read 214
#7/7 pyperf100.o:OK
verification time 20829297 usec
stack depth 344
processed 163461 insns (limit 1000000) max_states_per_insn 5
total_states 11566 peak_states 11539 mark_read 375
#7/8 pyperf180.o:OK
verification time 20471329 usec
stack depth 368
processed 628090 insns (limit 1000000) max_states_per_insn 7
total_states 30369 peak_states 30283 mark_read 751
#7/9 pyperf600.o:OK
verification time 7413522 usec
stack depth 320
processed 580752 insns (limit 1000000) max_states_per_insn 13
total_states 37099 peak_states 2118 mark_read 1292
#7/10 pyperf600_nounroll.o:OK
verification time 1046239 usec
stack depth 0
processed 361349 insns (limit 1000000) max_states_per_insn 4
total_states 5504 peak_states 5504 mark_read 31
#7/11 loop1.o:OK
verification time 32021 usec
stack depth 0
processed 1783 insns (limit 1000000) max_states_per_insn 8
total_states 57 peak_states 30 mark_read 2
#7/12 loop2.o:OK
verification time 8095 usec
stack depth 0
processed 524 insns (limit 1000000) max_states_per_insn 12
total_states 18 peak_states 17 mark_read 2
#7/13 loop4.o:OK
verification time 1116 usec
stack depth 0
processed 84 insns (limit 1000000) max_states_per_insn 2 total_states
9 peak_states 9 mark_read 2
#7/14 loop5.o:OK
verification time 8415325 usec
stack depth 496
processed 728100 insns (limit 1000000) max_states_per_insn 19
total_states 15783 peak_states 12404 mark_read 2674
#7/15 strobemeta.o:OK
verification time 1105531 usec
stack depth 488
processed 73803 insns (limit 1000000) max_states_per_insn 22
total_states 1997 peak_states 395 mark_read 137
#7/16 strobemeta_nounroll1.o:OK
verification time 13689512 usec
stack depth 488
processed 591026 insns (limit 1000000) max_states_per_insn 67
total_states 18823 peak_states 1901 mark_read 262
#7/17 strobemeta_nounroll2.o:OK
verification time 13242 usec
stack depth 488
processed 1464 insns (limit 1000000) max_states_per_insn 4
total_states 27 peak_states 27 mark_read 22
#7/18 test_sysctl_loop1.o:OK
verification time 13769 usec
stack depth 300+144
processed 1564 insns (limit 1000000) max_states_per_insn 4
total_states 28 peak_states 28 mark_read 22
#7/19 test_sysctl_loop2.o:OK
verification time 6436 usec
stack depth 36
processed 416 insns (limit 1000000) max_states_per_insn 1 total_states
19 peak_states 19 mark_read 6
#7/20 test_xdp_loop.o:OK
verification time 601603 usec
stack depth 88
processed 33670 insns (limit 1000000) max_states_per_insn 14
total_states 2214 peak_states 89 mark_read 7
#7/21 test_seg6_loop.o:OK
#7 bpf_verif_scale:OK
--
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com