On Tue, Mar 19, 2024 at 12:32 PM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > On Tue, Mar 19, 2024 at 12:08:35PM +0100, Jiri Olsa wrote: > > On Tue, Mar 19, 2024 at 11:25:24AM +0100, Oleg Nesterov wrote: > > > Obviously not for inclusion yet ;) untested, lacks the comments, and I am not > > > sure it makes sense. > > > > > > But I am wondering if this change can speedup uretprobes a bit more. Any chance > > > you can test it? > > > > > > With 1/3 sys_uretprobe() changes regs->r11/cx, this is correct but implies iret. > > > See the /* SYSRET requires RCX == RIP and R11 == EFLAGS */ code in do_syscall_64(). > > > > nice idea, looks like sysexit should be faster > > > > > > > > With this patch uretprobe_syscall_entry restores rcx/r11 itself and does retq, > > > sys_uretprobe() needs to hijack regs->ip after uprobe_handle_trampoline() to > > > make it possible. > > > > > > Comments? > > > > > > Oleg. > > > --- > > > > > > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c > > > index 069371e86180..b99f1d80a8c8 100644 > > > --- a/arch/x86/kernel/uprobes.c > > > +++ b/arch/x86/kernel/uprobes.c > > > @@ -319,6 +319,9 @@ asm ( > > > "pushq %r11\n" > > > "movq $462, %rax\n" > > > "syscall\n" > > > + "popq %r11\n" > > > + "popq %rcx\n" > > > + "retq\n" > > > > using rax space on stack for return pointer, cool :) > > > > I'll run the test with this change > > I got bigger speed up on intel, amd stays the same (I'll double check that) > > current: > base : 16.133 ± 0.035M/s > uprobe-nop : 3.003 ± 0.002M/s > uprobe-push : 2.829 ± 0.001M/s > uprobe-ret : 1.101 ± 0.001M/s > uretprobe-nop : 1.485 ± 0.001M/s > uretprobe-push : 1.447 ± 0.000M/s > uretprobe-ret : 0.812 ± 0.000M/s > > fix: > base : 16.522 ± 0.103M/s > uprobe-nop : 2.920 ± 0.034M/s > uprobe-push : 2.749 ± 0.047M/s > uprobe-ret : 1.094 ± 0.003M/s > uretprobe-nop : 2.004 ± 0.006M/s < ~34% speed up > uretprobe-push : 1.940 ± 0.003M/s < ~34% speed up > uretprobe-ret : 0.921 ± 0.050M/s < ~13% speed up > > original fix: > base : 15.704 ± 0.076M/s > uprobe-nop : 2.841 ± 0.008M/s > uprobe-push : 2.666 ± 0.029M/s > uprobe-ret : 1.037 ± 0.008M/s > uretprobe-nop : 1.718 ± 0.010M/s < ~25% speed up > uretprobe-push : 1.658 ± 0.008M/s < ~23% speed up > uretprobe-ret : 0.853 ± 0.004M/s < ~14% speed up > My machine is slower, even though I turned out mitigations and stuff like that, I feel like there are still some slow downs. But either way, data is at least consistent as far as baseline goes (it's called syscall-count now in my local changes I'm yet to submit), and yes, Oleg's change does produce a noticeable speed up: baseline ======== usermode-count : 79.509 ± 0.038M/s syscall-count : 9.550 ± 0.002M/s uprobe-nop : 1.530 ± 0.000M/s uprobe-push : 1.457 ± 0.000M/s uprobe-ret : 0.642 ± 0.000M/s uretprobe-nop : 0.777 ± 0.000M/s uretprobe-push : 0.761 ± 0.000M/s uretprobe-ret : 0.459 ± 0.000M/s Jiri ==== usermode-count : 79.515 ± 0.014M/s syscall-count : 9.439 ± 0.006M/s uprobe-nop : 1.520 ± 0.001M/s uprobe-push : 1.464 ± 0.000M/s uprobe-ret : 0.640 ± 0.000M/s uretprobe-nop : 0.893 ± 0.000M/s (+15%) uretprobe-push : 0.867 ± 0.000M/s (+14%) uretprobe-ret : 0.498 ± 0.000M/s (+8.5%) Oleg+Jiri ========= usermode-count : 79.471 ± 0.078M/s syscall-count : 9.434 ± 0.007M/s uprobe-nop : 1.516 ± 0.003M/s uprobe-push : 1.463 ± 0.000M/s uprobe-ret : 0.640 ± 0.001M/s uretprobe-nop : 1.020 ± 0.001M/s (+31%) uretprobe-push : 0.998 ± 0.001M/s (+31%) uretprobe-ret : 0.537 ± 0.000M/s (+17%) So it's 2x of just Jiri's changes, which is a very nice boost! I only tested on Intel CPU. > > jirka