[PATCH bpf-next] bpf: mark kprobe_multi_link_prog_run as always inlined function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



kprobe_multi_link_prog_run() is called both for multi-kprobe and
multi-kretprobe BPF programs from kprobe_multi_link_handler() and
kprobe_multi_link_exit_handler(), respectively.
kprobe_multi_link_prog_run() is doing all the relevant work, with those
wrappers just satisfying ftrace's interfaces (kprobe callback is
supposed to return int, while kretprobe returns void).

With this structure compile performs tail-call optimization:

Dump of assembler code for function kprobe_multi_link_exit_handler:
   0xffffffff8122f1e0 <+0>:     add    $0xffffffffffffffc0,%rdi
   0xffffffff8122f1e4 <+4>:     mov    %rcx,%rdx
   0xffffffff8122f1e7 <+7>:     jmp    0xffffffff81230080 <kprobe_multi_link_prog_run>

This means that when trying to capture LBR that traces all indirect branches
we are wasting an entry just to record that kprobe_multi_link_exit_handler
called/jumped into kprobe_multi_link_prog_run.

LBR entries are especially sparse on AMD CPUs (just 16 entries on latest CPUs
vs typically 32 on latest Intel CPUs), and every entry counts (and we already
have a bunch of other LBR entries spent getting to a BPF program), so it would
be great to not waste any more than necessary.

Marking it as just `static inline` doesn't change anything, compiler
still performs tail call optimization only. But by marking
kprobe_multi_link_prog_run() as __always_inline we ensure that compiler
fully inlines it, avoiding jumps:

Dump of assembler code for function kprobe_multi_link_exit_handler:
   0xffffffff8122f4e0 <+0>:     push   %r15
   0xffffffff8122f4e2 <+2>:     push   %r14
   0xffffffff8122f4e4 <+4>:     push   %r13
   0xffffffff8122f4e6 <+6>:     push   %r12
   0xffffffff8122f4e8 <+8>:     push   %rbx
   0xffffffff8122f4e9 <+9>:     sub    $0x10,%rsp
   0xffffffff8122f4ed <+13>:    mov    %rdi,%r14
   0xffffffff8122f4f0 <+16>:    lea    -0x40(%rdi),%rax

   ...

   0xffffffff8122f590 <+176>:   call   0xffffffff8108e420 <sched_clock>
   0xffffffff8122f595 <+181>:   sub    %r14,%rax
   0xffffffff8122f598 <+184>:   add    %rax,0x8(%rbx,%r13,1)
   0xffffffff8122f59d <+189>:   jmp    0xffffffff8122f541 <kprobe_multi_link_exit_handler+97>

Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
---
 kernel/trace/bpf_trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 434e3ece6688..0bebd6f02e17 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2796,7 +2796,7 @@ static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx)
 	return run_ctx->entry_ip;
 }
 
-static int
+static __always_inline int
 kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
 			   unsigned long entry_ip, struct pt_regs *regs)
 {
-- 
2.43.0





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux