On Sun, Apr 21, 2024 at 12:42 PM Jiri Olsa <jolsa@xxxxxxxxxx> wrote: > > Adding uretprobe syscall instead of trap to speed up return probe. > > At the moment the uretprobe setup/path is: > > - install entry uprobe > > - when the uprobe is hit, it overwrites probed function's return address > on stack with address of the trampoline that contains breakpoint > instruction > > - the breakpoint trap code handles the uretprobe consumers execution and > jumps back to original return address > > This patch replaces the above trampoline's breakpoint instruction with new > ureprobe syscall call. This syscall does exactly the same job as the trap > with some more extra work: > > - syscall trampoline must save original value for rax/r11/rcx registers > on stack - rax is set to syscall number and r11/rcx are changed and > used by syscall instruction > > - the syscall code reads the original values of those registers and > restore those values in task's pt_regs area > > - only caller from trampoline exposed in '[uprobes]' is allowed, > the process will receive SIGILL signal otherwise > > Even with some extra work, using the uretprobes syscall shows speed > improvement (compared to using standard breakpoint): > > On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz) > > current: > uretprobe-nop : 1.498 ± 0.000M/s > uretprobe-push : 1.448 ± 0.001M/s > uretprobe-ret : 0.816 ± 0.001M/s > > with the fix: > uretprobe-nop : 1.969 ± 0.002M/s < 31% speed up > uretprobe-push : 1.910 ± 0.000M/s < 31% speed up > uretprobe-ret : 0.934 ± 0.000M/s < 14% speed up > > On Amd (AMD Ryzen 7 5700U) > > current: > uretprobe-nop : 0.778 ± 0.001M/s > uretprobe-push : 0.744 ± 0.001M/s > uretprobe-ret : 0.540 ± 0.001M/s > > with the fix: > uretprobe-nop : 0.860 ± 0.001M/s < 10% speed up > uretprobe-push : 0.818 ± 0.001M/s < 10% speed up > uretprobe-ret : 0.578 ± 0.000M/s < 7% speed up > > The performance test spawns a thread that runs loop which triggers > uprobe with attached bpf program that increments the counter that > gets printed in results above. > > The uprobe (and uretprobe) kind is determined by which instruction > is being patched with breakpoint instruction. That's also important > for uretprobes, because uprobe is installed for each uretprobe. > > The performance test is part of bpf selftests: > tools/testing/selftests/bpf/run_bench_uprobes.sh > > Note at the moment uretprobe syscall is supported only for native > 64-bit process, compat process still uses standard breakpoint. > > Suggested-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx> > Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx> > --- > arch/x86/kernel/uprobes.c | 115 ++++++++++++++++++++++++++++++++++++++ > include/linux/uprobes.h | 3 + > kernel/events/uprobes.c | 24 +++++--- > 3 files changed, 135 insertions(+), 7 deletions(-) > LGTM as far as I can follow the code Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx> [...]