On Tue, Mar 19, 2024 at 08:31:55PM +0100, Jiri Olsa wrote: > On Tue, Mar 19, 2024 at 12:08:35PM +0100, Jiri Olsa wrote: > > On Tue, Mar 19, 2024 at 11:25:24AM +0100, Oleg Nesterov wrote: > > > Obviously not for inclusion yet ;) untested, lacks the comments, and I am not > > > sure it makes sense. > > > > > > But I am wondering if this change can speedup uretprobes a bit more. Any chance > > > you can test it? > > > > > > With 1/3 sys_uretprobe() changes regs->r11/cx, this is correct but implies iret. > > > See the /* SYSRET requires RCX == RIP and R11 == EFLAGS */ code in do_syscall_64(). > > > > nice idea, looks like sysexit should be faster > > > > > > > > With this patch uretprobe_syscall_entry restores rcx/r11 itself and does retq, > > > sys_uretprobe() needs to hijack regs->ip after uprobe_handle_trampoline() to > > > make it possible. > > > > > > Comments? > > > > > > Oleg. > > > --- > > > > > > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c > > > index 069371e86180..b99f1d80a8c8 100644 > > > --- a/arch/x86/kernel/uprobes.c > > > +++ b/arch/x86/kernel/uprobes.c > > > @@ -319,6 +319,9 @@ asm ( > > > "pushq %r11\n" > > > "movq $462, %rax\n" > > > "syscall\n" > > > + "popq %r11\n" > > > + "popq %rcx\n" > > > + "retq\n" > > > > using rax space on stack for return pointer, cool :) > > > > I'll run the test with this change > > I got bigger speed up on intel, amd stays the same (I'll double check that) yes, I'm getting no speed up on AMD, but Intel's great Oleg, are you ok if I squash the patches together or you want to send it separately? jirka > > current: > base : 16.133 ± 0.035M/s > uprobe-nop : 3.003 ± 0.002M/s > uprobe-push : 2.829 ± 0.001M/s > uprobe-ret : 1.101 ± 0.001M/s > uretprobe-nop : 1.485 ± 0.001M/s > uretprobe-push : 1.447 ± 0.000M/s > uretprobe-ret : 0.812 ± 0.000M/s > > fix: > base : 16.522 ± 0.103M/s > uprobe-nop : 2.920 ± 0.034M/s > uprobe-push : 2.749 ± 0.047M/s > uprobe-ret : 1.094 ± 0.003M/s > uretprobe-nop : 2.004 ± 0.006M/s < ~34% speed up > uretprobe-push : 1.940 ± 0.003M/s < ~34% speed up > uretprobe-ret : 0.921 ± 0.050M/s < ~13% speed up > > original fix: > base : 15.704 ± 0.076M/s > uprobe-nop : 2.841 ± 0.008M/s > uprobe-push : 2.666 ± 0.029M/s > uprobe-ret : 1.037 ± 0.008M/s > uretprobe-nop : 1.718 ± 0.010M/s < ~25% speed up > uretprobe-push : 1.658 ± 0.008M/s < ~23% speed up > uretprobe-ret : 0.853 ± 0.004M/s < ~14% speed up > > > jirka