Re: [PATCHv4 bpf-next 2/7] uprobe: Add uretprobe syscall to speed up return probe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 03, 2024 at 03:53:15PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2024-05-03 at 15:04 +0200, Jiri Olsa wrote:
> > On Fri, May 03, 2024 at 01:34:53PM +0200, Peter Zijlstra wrote:
> > > On Thu, May 02, 2024 at 02:23:08PM +0200, Jiri Olsa wrote:
> > > > Adding uretprobe syscall instead of trap to speed up return probe.
> > > > 
> > > > At the moment the uretprobe setup/path is:
> > > > 
> > > >    - install entry uprobe
> > > > 
> > > >    - when the uprobe is hit, it overwrites probed function's return
> > > > address
> > > >      on stack with address of the trampoline that contains breakpoint
> > > >      instruction
> > > > 
> > > >    - the breakpoint trap code handles the uretprobe consumers execution
> > > > and
> > > >      jumps back to original return address
> 
> Hi,
> 
> I worked on the x86 shadow stack support.
> 
> I didn't know uprobes did anything like this. In hindsight I should have looked
> more closely. The current upstream behavior is to overwrite the return address
> on the stack?
> 
> Stupid uprobes question - what is actually overwriting the return address on the
> stack? Is it the kernel? If so perhaps the kernel could just update the shadow
> stack at the same time.

yes, it's in kernel - arch_uretprobe_hijack_return_addr .. so I guess
we need to update the shadow stack with the new return value as well

> 
> > > > 
> > > > This patch replaces the above trampoline's breakpoint instruction with new
> > > > ureprobe syscall call. This syscall does exactly the same job as the trap
> > > > with some more extra work:
> > > > 
> > > >    - syscall trampoline must save original value for rax/r11/rcx registers
> > > >      on stack - rax is set to syscall number and r11/rcx are changed and
> > > >      used by syscall instruction
> > > > 
> > > >    - the syscall code reads the original values of those registers and
> > > >      restore those values in task's pt_regs area
> > > > 
> > > >    - only caller from trampoline exposed in '[uprobes]' is allowed,
> > > >      the process will receive SIGILL signal otherwise
> > > > 
> > > 
> > > Did you consider shadow stacks? IIRC we currently have userspace shadow
> > > stack support available, and that will utterly break all of this.
> > 
> > nope.. I guess it's the extra ret instruction in the trampoline that would
> > make it crash?
> 
> The original behavior seems problematic for shadow stack IIUC. I'm not sure of
> the additional breakage with the new behavior.

I can see it's broken also for current uprobes

> 
> Roughly, how shadow stack works is there is an additional protected stack for
> the app thread. The HW pushes to from the shadow stack with CALL, and pops from
> it with RET. But it also continues to push and pop from the normal stack. On
> pop, if the values don't match between the two stacks, an exception is
> generated. The whole point is to prevent the app from overwriting its stack
> return address to return to random places.
> 
> Userspace cannot (normally) write to the shadow stack, but the kernel can do
> this or adust the SSP (shadow stack pointer). So in the kernel (for things like
> sigreturn) there is an ability to do what is needed. Ptracers also can do things
> like this.

hack below seems to fix it for the current uprobe setup,
we need similar fix for the uretprobe syscall trampoline setup

jirka


---
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index 42fee8959df7..99a0948a3b79 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -21,6 +21,7 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clon
 void shstk_free(struct task_struct *p);
 int setup_signal_shadow_stack(struct ksignal *ksig);
 int restore_signal_shadow_stack(void);
+void uprobe_change_stack(unsigned long addr);
 #else
 static inline long shstk_prctl(struct task_struct *task, int option,
 			       unsigned long arg2) { return -EINVAL; }
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 59e15dd8d0f8..d2c4dbe5843c 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -577,3 +577,11 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long arg2)
 		return wrss_control(true);
 	return -EINVAL;
 }
+
+void uprobe_change_stack(unsigned long addr)
+{
+	unsigned long ssp;
+
+	ssp = get_user_shstk_addr();
+	write_user_shstk_64((u64 __user *)ssp, (u64)addr);
+}
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 81e6ee95784d..88afbeaacb8f 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -348,7 +348,7 @@ void *arch_uprobe_trampoline(unsigned long *psize)
 	 * only for native 64-bit process, the compat process still uses
 	 * standard breakpoint.
 	 */
-	if (user_64bit_mode(regs)) {
+	if (0 && user_64bit_mode(regs)) {
 		*psize = uretprobe_syscall_end - uretprobe_syscall_entry;
 		return uretprobe_syscall_entry;
 	}
@@ -1191,8 +1191,10 @@ arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs
 		return orig_ret_vaddr;
 
 	nleft = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize);
-	if (likely(!nleft))
+	if (likely(!nleft)) {
+		uprobe_change_stack(trampoline_vaddr);
 		return orig_ret_vaddr;
+	}
 
 	if (nleft != rasize) {
 		pr_err("return address clobbered: pid=%d, %%sp=%#lx, %%ip=%#lx\n",




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux