On Wed, Mar 27, 2024 at 3:20 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote: > > Adding uretprobe syscall instead of trap to speed up return probe. > > At the moment the uretprobe setup/path is: > > - install entry uprobe > > - when the uprobe is hit, it overwrites probed function's return address > on stack with address of the trampoline that contains breakpoint > instruction > > - the breakpoint trap code handles the uretprobe consumers execution and > jumps back to original return address > > This patch replaces the above trampoline's breakpoint instruction with new > ureprobe syscall call. This syscall does exactly the same job as the trap > with some more extra work: > > - syscall trampoline must save original value for rax/r11/rcx registers > on stack - rax is set to syscall number and r11/rcx are changed and > used by syscall instruction > > - the syscall code reads the original values of those registers and > restore those values in task's pt_regs area > > Even with the extra registers handling code the having uretprobes handled > by syscalls shows speed improvement. > > On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz) > > current: > > base : 15.888 ± 0.033M/s > uprobe-nop : 3.016 ± 0.000M/s > uprobe-push : 2.832 ± 0.005M/s > uprobe-ret : 1.104 ± 0.000M/s > uretprobe-nop : 1.487 ± 0.000M/s > uretprobe-push : 1.456 ± 0.000M/s > uretprobe-ret : 0.816 ± 0.001M/s > > with the fix: > > base : 15.116 ± 0.045M/s > uprobe-nop : 3.001 ± 0.045M/s > uprobe-push : 2.831 ± 0.004M/s > uprobe-ret : 1.102 ± 0.001M/s > uretprobe-nop : 1.969 ± 0.001M/s < 32% speedup > uretprobe-push : 1.905 ± 0.004M/s < 30% speedup > uretprobe-ret : 0.933 ± 0.002M/s < 14% speedup > > On Amd (AMD Ryzen 7 5700U) > > current: > > base : 5.105 ± 0.003M/s > uprobe-nop : 1.552 ± 0.002M/s > uprobe-push : 1.408 ± 0.003M/s > uprobe-ret : 0.827 ± 0.001M/s > uretprobe-nop : 0.779 ± 0.001M/s > uretprobe-push : 0.750 ± 0.001M/s > uretprobe-ret : 0.539 ± 0.001M/s > > with the fix: > > base : 5.119 ± 0.002M/s > uprobe-nop : 1.523 ± 0.003M/s > uprobe-push : 1.384 ± 0.003M/s > uprobe-ret : 0.826 ± 0.002M/s > uretprobe-nop : 0.866 ± 0.002M/s < 11% speedup > uretprobe-push : 0.826 ± 0.002M/s < 10% speedup > uretprobe-ret : 0.581 ± 0.001M/s < 7% speedup > > Suggested-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx> > Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx> > --- > arch/x86/entry/syscalls/syscall_64.tbl | 1 + > arch/x86/kernel/uprobes.c | 83 ++++++++++++++++++++++++++ > include/linux/syscalls.h | 2 + > include/linux/uprobes.h | 2 + > include/uapi/asm-generic/unistd.h | 5 +- > kernel/events/uprobes.c | 18 ++++-- > kernel/sys_ni.c | 2 + > 7 files changed, 108 insertions(+), 5 deletions(-) > Great work and results, thanks! Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx> [...]