Re: [PATCH v5 0/5] Reduce overhead of LSMs with static calls

Paolo Abeni <pabeni@xxxxxxxxxx> · Mon, 02 Oct 2023 13:06:15 +0200

On Thu, 2023-09-28 at 22:24 +0200, KP Singh wrote:
> # Background
> 
> LSM hooks (callbacks) are currently invoked as indirect function calls. These
> callbacks are registered into a linked list at boot time as the order of the
> LSMs can be configured on the kernel command line with the "lsm=" command line
> parameter.
> 
> Indirect function calls have a high overhead due to retpoline mitigation for
> various speculative execution attacks.
> 
> Retpolines remain relevant even with newer generation CPUs as recently
> discovered speculative attacks, like Spectre BHB need Retpolines to mitigate
> against branch history injection and still need to be used in combination with
> newer mitigation features like eIBRS.
> 
> This overhead is especially significant for the "bpf" LSM which allows the user
> to implement LSM functionality with eBPF program. In order to facilitate this
> the "bpf" LSM provides a default callback for all LSM hooks. When enabled,
> the "bpf" LSM incurs an unnecessary / avoidable indirect call. This is
> especially bad in OS hot paths (e.g. in the networking stack).
> This overhead prevents the adoption of bpf LSM on performance critical
> systems, and also, in general, slows down all LSMs.
> 
> Since we know the address of the enabled LSM callbacks at compile time and only
> the order is determined at boot time, the LSM framework can allocate static
> calls for each of the possible LSM callbacks and these calls can be updated once
> the order is determined at boot.
> 
> This series is a respin of the RFC proposed by Paul Renauld (renauld@xxxxxxxxxx)
> and Brendan Jackman (jackmanb@xxxxxxxxxx) [1]
> 
> # Performance improvement
> 
> With this patch-set some syscalls with lots of LSM hooks in their path
> benefitted at an average of ~3% and I/O and Pipe based system calls benefitting
> the most.
> 
> Here are the results of the relevant Unixbench system benchmarks with BPF LSM
> and SELinux enabled with default policies enabled with and without these
> patches.
> 
> Benchmark                                               Delta(%): (+ is better)
> ===============================================================================
> Execl Throughput                                             +1.9356
> File Write 1024 bufsize 2000 maxblocks                       +6.5953
> Pipe Throughput                                              +9.5499
> Pipe-based Context Switching                                 +3.0209
> Process Creation                                             +2.3246
> Shell Scripts (1 concurrent)                                 +1.4975
> System Call Overhead                                         +2.7815
> System Benchmarks Index Score (Partial Only):                +3.4859

FTR, I also measure a ~3% tput improvement in UDP stream test over
loopback.

@KP Singh, I would have appreciated being cc-ed here, since I provided
feedback on a previous revision (as soon as I learned of this effort).

Cheers,

Paolo