On 20-Aug-2020 06:47:53 PM, Brendan Jackman wrote: > From: Paul Renauld <renauld@xxxxxxxxxx> > > LSMs have high overhead due to indirect function calls through > retpolines. This RPC proposes to replace these with static calls [1] > instead. > > This overhead is especially significant for the "bpf" LSM which supports > the implementation of LSM hooks with eBPF programs (security/bpf)[2]. In > order to facilitate this, the "bpf" LSM provides a default nop callback for > all LSM hooks. When enabled, the "bpf", LSM incurs an unnecessary / > avoidable indirect call to this nop callback. > > The performance impact on a simple syscall eventfd_write (which triggers > the file_permission hook) was measured with and without "bpf" LSM > enabled. Activating the LSM resulted in an overhead of 4% [3]. > > This overhead prevents the adoption of bpf LSM on performance critical > systems, and also, in general, slows down all LSMs. > > Currently, the LSM hook callbacks are stored in a linked list and > dispatched as indirect calls. Using static calls can remove this overhead > by replacing all indirect calls with direct calls. > > During the discussion of the "bpf" LSM patch-set it was proposed to special > case BPF LSM to avoid the overhead by using static keys. This was however > not accepted and it was decided to [4]: > > - Not special-case the "bpf" LSM. > - Implement a general solution benefitting the whole LSM framework. > > This is based on the static call branch [5]. Hi! So I reviewed this quickly, and hopefully my understanding is correct. AFAIU, your approach is limited to scenarios where the callbacks are known at compile-time. It also appears to add the overhead of a switch/case for every function call on the fast-path. I am the original author of the tracepoint infrastructure in the Linux kernel, which also needs to iterate on an array of callbacks. Recently, Steven Rostedt pushed a change which accelerates the single-callback case using static calls to reduce retpoline mitigation overhead, but I would prefer if we could accelerate the multiple-callback case as well. Note that for tracepoints, the callbacks are not known at compile-time. This is where I think we could come up with a generic solution that would fit both LSM and tracepoint use-cases. Here is what I have in mind. Let's say we generate code to accelerate up to N calls, and after that we have a fallback using indirect calls. Then we should be able to generate the following using static keys as a jump table and N static calls: jump <static key label target> label_N: stack setup call label_N-1: stack setup call label_N-2: stack setup call ... label_0: jump end label_fallback: <iteration and indirect calls> end: So the static keys would be used to jump to the appropriate label (using a static branch, which has pretty much 0 overhead). Static calls would be used to implement each of the calls. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com