On Tue, Mar 19, 2024 at 12:08:35PM +0100, Jiri Olsa wrote: > On Tue, Mar 19, 2024 at 11:25:24AM +0100, Oleg Nesterov wrote: > > Obviously not for inclusion yet ;) untested, lacks the comments, and I am not > > sure it makes sense. > > > > But I am wondering if this change can speedup uretprobes a bit more. Any chance > > you can test it? > > > > With 1/3 sys_uretprobe() changes regs->r11/cx, this is correct but implies iret. > > See the /* SYSRET requires RCX == RIP and R11 == EFLAGS */ code in do_syscall_64(). > > nice idea, looks like sysexit should be faster > > > > > With this patch uretprobe_syscall_entry restores rcx/r11 itself and does retq, > > sys_uretprobe() needs to hijack regs->ip after uprobe_handle_trampoline() to > > make it possible. > > > > Comments? > > > > Oleg. > > --- > > > > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c > > index 069371e86180..b99f1d80a8c8 100644 > > --- a/arch/x86/kernel/uprobes.c > > +++ b/arch/x86/kernel/uprobes.c > > @@ -319,6 +319,9 @@ asm ( > > "pushq %r11\n" > > "movq $462, %rax\n" > > "syscall\n" > > + "popq %r11\n" > > + "popq %rcx\n" > > + "retq\n" > > using rax space on stack for return pointer, cool :) > > I'll run the test with this change I got bigger speed up on intel, amd stays the same (I'll double check that) current: base : 16.133 ± 0.035M/s uprobe-nop : 3.003 ± 0.002M/s uprobe-push : 2.829 ± 0.001M/s uprobe-ret : 1.101 ± 0.001M/s uretprobe-nop : 1.485 ± 0.001M/s uretprobe-push : 1.447 ± 0.000M/s uretprobe-ret : 0.812 ± 0.000M/s fix: base : 16.522 ± 0.103M/s uprobe-nop : 2.920 ± 0.034M/s uprobe-push : 2.749 ± 0.047M/s uprobe-ret : 1.094 ± 0.003M/s uretprobe-nop : 2.004 ± 0.006M/s < ~34% speed up uretprobe-push : 1.940 ± 0.003M/s < ~34% speed up uretprobe-ret : 0.921 ± 0.050M/s < ~13% speed up original fix: base : 15.704 ± 0.076M/s uprobe-nop : 2.841 ± 0.008M/s uprobe-push : 2.666 ± 0.029M/s uprobe-ret : 1.037 ± 0.008M/s uretprobe-nop : 1.718 ± 0.010M/s < ~25% speed up uretprobe-push : 1.658 ± 0.008M/s < ~23% speed up uretprobe-ret : 0.853 ± 0.004M/s < ~14% speed up jirka