On Fri, May 03, 2024 at 03:53:15PM +0000, Edgecombe, Rick P wrote: > On Fri, 2024-05-03 at 15:04 +0200, Jiri Olsa wrote: > > On Fri, May 03, 2024 at 01:34:53PM +0200, Peter Zijlstra wrote: > > > On Thu, May 02, 2024 at 02:23:08PM +0200, Jiri Olsa wrote: > > > > Adding uretprobe syscall instead of trap to speed up return probe. > > > > > > > > At the moment the uretprobe setup/path is: > > > > > > > > - install entry uprobe > > > > > > > > - when the uprobe is hit, it overwrites probed function's return > > > > address > > > > on stack with address of the trampoline that contains breakpoint > > > > instruction > > > > > > > > - the breakpoint trap code handles the uretprobe consumers execution > > > > and > > > > jumps back to original return address > > Hi, > > I worked on the x86 shadow stack support. > > I didn't know uprobes did anything like this. In hindsight I should have looked > more closely. The current upstream behavior is to overwrite the return address > on the stack? > > Stupid uprobes question - what is actually overwriting the return address on the > stack? Is it the kernel? If so perhaps the kernel could just update the shadow > stack at the same time. yes, it's in kernel - arch_uretprobe_hijack_return_addr .. so I guess we need to update the shadow stack with the new return value as well > > > > > > > > > This patch replaces the above trampoline's breakpoint instruction with new > > > > ureprobe syscall call. This syscall does exactly the same job as the trap > > > > with some more extra work: > > > > > > > > - syscall trampoline must save original value for rax/r11/rcx registers > > > > on stack - rax is set to syscall number and r11/rcx are changed and > > > > used by syscall instruction > > > > > > > > - the syscall code reads the original values of those registers and > > > > restore those values in task's pt_regs area > > > > > > > > - only caller from trampoline exposed in '[uprobes]' is allowed, > > > > the process will receive SIGILL signal otherwise > > > > > > > > > > Did you consider shadow stacks? IIRC we currently have userspace shadow > > > stack support available, and that will utterly break all of this. > > > > nope.. I guess it's the extra ret instruction in the trampoline that would > > make it crash? > > The original behavior seems problematic for shadow stack IIUC. I'm not sure of > the additional breakage with the new behavior. I can see it's broken also for current uprobes > > Roughly, how shadow stack works is there is an additional protected stack for > the app thread. The HW pushes to from the shadow stack with CALL, and pops from > it with RET. But it also continues to push and pop from the normal stack. On > pop, if the values don't match between the two stacks, an exception is > generated. The whole point is to prevent the app from overwriting its stack > return address to return to random places. > > Userspace cannot (normally) write to the shadow stack, but the kernel can do > this or adust the SSP (shadow stack pointer). So in the kernel (for things like > sigreturn) there is an ability to do what is needed. Ptracers also can do things > like this. hack below seems to fix it for the current uprobe setup, we need similar fix for the uretprobe syscall trampoline setup jirka --- diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index 42fee8959df7..99a0948a3b79 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -21,6 +21,7 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clon void shstk_free(struct task_struct *p); int setup_signal_shadow_stack(struct ksignal *ksig); int restore_signal_shadow_stack(void); +void uprobe_change_stack(unsigned long addr); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { return -EINVAL; } diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 59e15dd8d0f8..d2c4dbe5843c 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -577,3 +577,11 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) return wrss_control(true); return -EINVAL; } + +void uprobe_change_stack(unsigned long addr) +{ + unsigned long ssp; + + ssp = get_user_shstk_addr(); + write_user_shstk_64((u64 __user *)ssp, (u64)addr); +} diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c index 81e6ee95784d..88afbeaacb8f 100644 --- a/arch/x86/kernel/uprobes.c +++ b/arch/x86/kernel/uprobes.c @@ -348,7 +348,7 @@ void *arch_uprobe_trampoline(unsigned long *psize) * only for native 64-bit process, the compat process still uses * standard breakpoint. */ - if (user_64bit_mode(regs)) { + if (0 && user_64bit_mode(regs)) { *psize = uretprobe_syscall_end - uretprobe_syscall_entry; return uretprobe_syscall_entry; } @@ -1191,8 +1191,10 @@ arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs return orig_ret_vaddr; nleft = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize); - if (likely(!nleft)) + if (likely(!nleft)) { + uprobe_change_stack(trampoline_vaddr); return orig_ret_vaddr; + } if (nleft != rasize) { pr_err("return address clobbered: pid=%d, %%sp=%#lx, %%ip=%#lx\n",