On Sun, May 31, 2020 at 10:19:15AM -0700, Alexei Starovoitov wrote: > Thank you for crafting a benchmark. > The only thing that it's not doing a fair comparison. > The problem with that patch [1] that is using: > > static noinline u32 __seccomp_benchmark(struct bpf_prog *prog, > const struct seccomp_data *sd) > { > return SECCOMP_RET_ALLOW; > } > > as a benchmarking function. > The 'noinline' keyword tells the compiler to keep the body of the function, but > the compiler is still doing full control and data flow analysis though this > function and it is smart enough to optimize its usage in seccomp_run_filters() > and in __seccomp_filter() because all functions are in a single .c file. > Lots of code gets optimized away when 'f->benchmark' is on. > > To make it into fair comparison I've added the following patch > on top of your [1]. > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index 2fdbf5ad8372..86204422e096 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -244,7 +244,7 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen) > return 0; > } > > -static noinline u32 __seccomp_benchmark(struct bpf_prog *prog, > +__weak noinline u32 __seccomp_benchmark(struct bpf_prog *prog, > const struct seccomp_data *sd) > > Please take a look at 'make kernel/seccomp.s' before and after to see the difference > __weak keyword makes. Ah yeah, thanks. That does bring it up to the same overhead. Nice! > And here is what seccomp_benchmark now reports: > > Benchmarking 33554432 samples... > 22.618269641 - 15.030812794 = 7587456847 > getpid native: 226 ns > 30.792042986 - 22.619048831 = 8172994155 > getpid RET_ALLOW 1 filter: 243 ns > 39.451435038 - 30.792836778 = 8658598260 > getpid RET_ALLOW 2 filters: 258 ns > 47.616011529 - 39.452190830 = 8163820699 > getpid BPF-less allow: 243 ns > Estimated total seccomp overhead for 1 filter: 17 ns > Estimated total seccomp overhead for 2 filters: 32 ns > Estimated seccomp per-filter overhead: 15 ns > Estimated seccomp entry overhead: 2 ns > Estimated BPF overhead per filter: 0 ns > > [...] > > > So, with the layered nature of seccomp filters there's a reasonable gain > > to be seen for a O(1) bitmap lookup to skip running even a single filter, > > even for the fastest BPF mode. > > This is not true. > The O(1) bitmap implemented as kernel C code will have exactly the same speed > as O(1) bitmap implemented as eBPF program. Yes, that'd be true if it was the first (and only) filter. What I'm trying to provide is a mechanism to speed up the syscalls for all attached filters (i.e. create a seccomp fast-path). The reality of seccomp usage is that it's very layered: systemd sets some (or many!), then container runtime sets some, then the process itself might set some. -- Kees Cook