On Mon, 20 Jul 2020 at 21:37, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Fri, Jul 17, 2020 at 11:46:37AM +0100, Lorenz Bauer wrote: > > Hi list, > > > > I'm not sure whether this is a bug report or just the way of life. > > The problem: we have a couple of machines that run KASAN > > kernels to weed out bugs. On those machines, loading our > > cls-redirect TC classifier takes so long that our user space > > program aborts. > > > > I've reproduced this in a VM: loading cls-redirect on a VM > > with a 5.4 kernel without KASAN takes around 4 seconds. > > Doing the same on recent bpf-next with KASAN and other > > shenanigans enabled it takes more like a minute. > > a minute to load single program? that sounds high. > While processing patches I build the kernel with kasan and lockdep. > None of test_progs and test_verifier programs have such drastic > slowdowns. I'm not sure what is the reason. Sorry for the long delay, I was on holiday. > do you use kasan inline or outline ? CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000 CONFIG_HAVE_ARCH_KASAN=y CONFIG_HAVE_ARCH_KASAN_VMALLOC=y CONFIG_CC_HAS_KASAN_GENERIC=y CONFIG_KASAN=y CONFIG_KASAN_GENERIC=y # CONFIG_KASAN_OUTLINE is not set CONFIG_KASAN_INLINE=y CONFIG_KASAN_STACK=1 # CONFIG_KASAN_VMALLOC is not set # CONFIG_TEST_KASAN is not set Looks like the inline version. > > > Is it expected that the overhead of KASAN is this large? > > sounds like 20x overhead ? I think 10x is normal. > > > I went and collected a perf profile of loading the program > > in the VM: > > > > - 96.31% 1.00% redirect.test [kernel.kallsyms] [k] do_check_common > > - 95.32% do_check_common > > - 69.24% states_equal.isra.0 > > + 49.81% kmem_cache_alloc_trace > > + 16.77% kfree > > + 1.22% regsafe.part.0 > > - 12.75% push_stack > > - 10.65% copy_verifier_state > > - 4.50% realloc_stack_state > > + 4.48% __kmalloc > > + 4.16% kmem_cache_alloc_trace > > + 1.82% __kmalloc > > + 2.07% kmem_cache_alloc_trace > > + 5.25% pop_stack > > + 2.84% push_jmp_history.isra.0 > > + 2.46% copy_verifier_state > > + 1.00% free_verifier_state > > 0.53% kmem_cache_alloc_trace > > + 1.00% runtime.goexit > > the perf profile makes sense. > how many insn it processed? On a fresh bpf-next build in the VM: TestLoadProgram: redirect.go:76: cls_redirect load time 1m11.886066762s TestLoadProgram: redirect.go:77: 25735 insns (limit 1000000) max_states_per_insn 28 total_states 26728 peak_states 1362 mark_read 26 On my ubuntu 5.4.0-40-generic #44-Ubuntu (not VM!): TestLoadProgram: redirect.go:76: cls_redirect load time 3.025380047s TestLoadProgram: redirect.go:77: 25735 insns (limit 1000000) max_states_per_insn 28 total_states 26728 peak_states 1362 mark_read 26 > what are test_progs -s stats ? #7/1 loop3.o:OK verification time 2215681 usec stack depth 8+0 processed 554754 insns (limit 1000000) max_states_per_insn 18 total_states 8636 peak_states 2141 mark_read 3 #7/2 test_verif_scale1.o:OK verification time 4037383 usec stack depth 8 processed 773445 insns (limit 1000000) max_states_per_insn 13 total_states 3048 peak_states 788 mark_read 1 #7/3 test_verif_scale2.o:OK verification time 2132131 usec stack depth 8+0 processed 845499 insns (limit 1000000) max_states_per_insn 18 total_states 8636 peak_states 2141 mark_read 3 #7/4 test_verif_scale3.o:OK verification time 95848 usec stack depth 0+352 processed 6102 insns (limit 1000000) max_states_per_insn 1 total_states 422 peak_states 422 mark_read 111 #7/5 pyperf_global.o:OK verification time 1308885 usec stack depth 352 processed 46378 insns (limit 1000000) max_states_per_insn 5 total_states 3263 peak_states 3241 mark_read 113 #7/6 pyperf50.o:OK verification time 5858365 usec stack depth 352 processed 99548 insns (limit 1000000) max_states_per_insn 5 total_states 6909 peak_states 6883 mark_read 214 #7/7 pyperf100.o:OK verification time 20829297 usec stack depth 344 processed 163461 insns (limit 1000000) max_states_per_insn 5 total_states 11566 peak_states 11539 mark_read 375 #7/8 pyperf180.o:OK verification time 20471329 usec stack depth 368 processed 628090 insns (limit 1000000) max_states_per_insn 7 total_states 30369 peak_states 30283 mark_read 751 #7/9 pyperf600.o:OK verification time 7413522 usec stack depth 320 processed 580752 insns (limit 1000000) max_states_per_insn 13 total_states 37099 peak_states 2118 mark_read 1292 #7/10 pyperf600_nounroll.o:OK verification time 1046239 usec stack depth 0 processed 361349 insns (limit 1000000) max_states_per_insn 4 total_states 5504 peak_states 5504 mark_read 31 #7/11 loop1.o:OK verification time 32021 usec stack depth 0 processed 1783 insns (limit 1000000) max_states_per_insn 8 total_states 57 peak_states 30 mark_read 2 #7/12 loop2.o:OK verification time 8095 usec stack depth 0 processed 524 insns (limit 1000000) max_states_per_insn 12 total_states 18 peak_states 17 mark_read 2 #7/13 loop4.o:OK verification time 1116 usec stack depth 0 processed 84 insns (limit 1000000) max_states_per_insn 2 total_states 9 peak_states 9 mark_read 2 #7/14 loop5.o:OK verification time 8415325 usec stack depth 496 processed 728100 insns (limit 1000000) max_states_per_insn 19 total_states 15783 peak_states 12404 mark_read 2674 #7/15 strobemeta.o:OK verification time 1105531 usec stack depth 488 processed 73803 insns (limit 1000000) max_states_per_insn 22 total_states 1997 peak_states 395 mark_read 137 #7/16 strobemeta_nounroll1.o:OK verification time 13689512 usec stack depth 488 processed 591026 insns (limit 1000000) max_states_per_insn 67 total_states 18823 peak_states 1901 mark_read 262 #7/17 strobemeta_nounroll2.o:OK verification time 13242 usec stack depth 488 processed 1464 insns (limit 1000000) max_states_per_insn 4 total_states 27 peak_states 27 mark_read 22 #7/18 test_sysctl_loop1.o:OK verification time 13769 usec stack depth 300+144 processed 1564 insns (limit 1000000) max_states_per_insn 4 total_states 28 peak_states 28 mark_read 22 #7/19 test_sysctl_loop2.o:OK verification time 6436 usec stack depth 36 processed 416 insns (limit 1000000) max_states_per_insn 1 total_states 19 peak_states 19 mark_read 6 #7/20 test_xdp_loop.o:OK verification time 601603 usec stack depth 88 processed 33670 insns (limit 1000000) max_states_per_insn 14 total_states 2214 peak_states 89 mark_read 7 #7/21 test_seg6_loop.o:OK #7 bpf_verif_scale:OK -- Lorenz Bauer | Systems Engineer 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK www.cloudflare.com