Re: [RFC] security: replace indirect calls with static calls

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Fri, 5 Feb 2021 10:09:26 -0500

On 20-Aug-2020 06:47:53 PM, Brendan Jackman wrote:
> From: Paul Renauld <renauld@xxxxxxxxxx>
> 
> LSMs have high overhead due to indirect function calls through
> retpolines. This RPC proposes to replace these with static calls [1]
> instead.
> 
> This overhead is especially significant for the "bpf" LSM which supports
> the implementation of LSM hooks with eBPF programs (security/bpf)[2]. In
> order to facilitate this, the "bpf" LSM provides a default nop callback for
> all LSM hooks. When enabled, the "bpf", LSM incurs an unnecessary /
> avoidable indirect call to this nop callback.
> 
> The performance impact on a simple syscall eventfd_write (which triggers
> the file_permission hook) was measured with and without "bpf" LSM
> enabled. Activating the LSM resulted in an overhead of 4% [3].
> 
> This overhead prevents the adoption of bpf LSM on performance critical
> systems, and also, in general, slows down all LSMs.
> 
> Currently, the LSM hook callbacks are stored in a linked list and
> dispatched as indirect calls. Using static calls can remove this overhead
> by replacing all indirect calls with direct calls.
> 
> During the discussion of the "bpf" LSM patch-set it was proposed to special
> case BPF LSM to avoid the overhead by using static keys. This was however
> not accepted and it was decided to [4]:
> 
> - Not special-case the "bpf" LSM.
> - Implement a general solution benefitting the whole LSM framework.
> 
> This is based on the static call branch [5].

Hi!

So I reviewed this quickly, and hopefully my understanding is correct.
AFAIU, your approach is limited to scenarios where the callbacks are
known at compile-time. It also appears to add the overhead of a
switch/case for every function call on the fast-path.

I am the original author of the tracepoint infrastructure in the Linux
kernel, which also needs to iterate on an array of callbacks. Recently,
Steven Rostedt pushed a change which accelerates the single-callback
case using static calls to reduce retpoline mitigation overhead, but I
would prefer if we could accelerate the multiple-callback case as well.
Note that for tracepoints, the callbacks are not known at compile-time.

This is where I think we could come up with a generic solution that
would fit both LSM and tracepoint use-cases.

Here is what I have in mind. Let's say we generate code to accelerate up
to N calls, and after that we have a fallback using indirect calls.

Then we should be able to generate the following using static keys as a
jump table and N static calls:

  jump <static key label target>
label_N:
  stack setup
  call
label_N-1:
  stack setup
  call
label_N-2:
  stack setup
  call
  ...
label_0:
  jump end
label_fallback:
  <iteration and indirect calls>
end:

So the static keys would be used to jump to the appropriate label (using
a static branch, which has pretty much 0 overhead). Static calls would
be used to implement each of the calls.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com