On Fri, Jun 29, 2018 at 8:27 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > On Fri, Jun 29, 2018, 08:03 Mathieu Desnoyers > <mathieu.desnoyers@xxxxxxxxxxxx> wrote: >> >> >> Considering those inconsistencies between architectures (either >> the task gets killed, or the top bits are silently cleared), I'm >> very much tempted to be restrictive in the inputs accepted by >> rseq, and not rely on architectures as providing consistent >> validation of the return IP. >> >> Thoughts ? > > > Then you need to make it a compat system call, since clearly you and Andy > want the 32-bit case to do something different from the 64-bit case. I personally would like the compat and non-compat cases to do exactly the same thing. If abort_ip is the address (as a u64) of a valid executable instruction and an abort happens, then that instruction should get executed. If abort_ip does not point to user-executable memory, then the process should get a signal. The problem isn't with rseq per se -- it's with the daft way that the x86 return-to-userspace instructions work. If I apply this patch: diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 3b2490b81918..26e4ba44e87b 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -346,6 +346,8 @@ __visible void do_int80_syscall_32(struct pt_regs *regs) { enter_from_user_mode(); local_irq_enable(); + if (!user_64bit_mode(regs)) + regs->ip |= (1UL << 32); do_syscall_32_irqs_on(regs); } the kernel *still works*. But this unconditionally uses the IRET path, and I don't even want to speculate as to what the hell happens if we exit using SYSRETL, or LRET, or SYSEXITL, or if Intel or AMD ever gives us a new mode that gets rid of the espfix shite. IOW, I don't think that the x86 entry code should make a promise that it will continue ignoring the high bits of regs->ip when !user_64bit_mode(regs) on a 64-bit kernel. The problem with rseq as it stands is that this oddity gets accidentally exposed all the way to userspace. If we're not careful, it'll be possible for a slightly buggy user program to goof up the code that generates the data structure that supplies abort_ip such that the high bits are garbage (0xcccccccc due to padding, for example), and the kernel will do exactly what the user code requested, and we'll get regs->ip = 0xcccccccc00000000 | (the actual intended abort_ip), and we'll pass that crap value all the way to IRET, and IRET will truncate it back down to the correct value. And then some CPU will add new behavior or we'll invoke SYSRETL or whatever on some weird CPU, and the program will crash. And we'll be sad. I suppose we could handle this in the entry code by coming up with a way to reject out-of-bounds regs->ip for 32-bit tasks, but that's going to be a bit messy and will slow down normal code that doesn't use rseq. Other than rseq, I don't think that there's any real issue. The only ways to get regs->ip >= 2^32 in a 32-bit task involve ptrace or manual fiddling with signal contexts, and I don't expect to ever have any real software depend on precisely what happens. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html