----- On Jun 29, 2018, at 1:03 PM, Linus Torvalds torvalds@xxxxxxxxxxxxxxxxxxxx wrote: > On Fri, Jun 29, 2018 at 9:07 AM Mathieu Desnoyers > <mathieu.desnoyers@xxxxxxxxxxxx> wrote: >> >> This code is not invoked from syscalls, but rather on return from >> interrupt/trap after a preemption. > > But when we register the rseq, we could easily check that the top bits > of the IP is clear, no? When a thread registers rseq, it registers a pointer to a user-space address where a struct rseq is located. That struct rseq is typically in a TLS area. It contains a pointer to the current "struct rseq_cs": the content of rseq_cs describes the current rseq critical section. So when we register rseq, the rseq->rseq_cs pointer value is typically NULL, because there is no currently active critical section. It's after return from sys_rseq registration that user-space eventually sets the pointer to a non-NULL value when it enters a critical section. So at rseq registration, there is no point in validating the value of the rseq_cs pointer, nor of any fields in the struct rseq_cs that would be currently pointed to by that rseq_cs pointer, because those all change after registration. > Sure, user space can change it after the fact, but at that point it's > literally "user space is being intentionally stupid". User-space can be either stupid, or really clever and trying to attack the kernel. > The real worry is that 32-bit compat code never initializes those bits > at all, no? There are two aspects I'm concerned about here: 1) security: we don't want 32-bit user-space to feed a 64-bit value over 4GB as abort_ip that may end up causing OOPSes on architectures that would lack proper validation of those values on return to userspace. 2) behavior consistency of 32-bit userspace on both native 32-bit and 32-bit compat on 64-bit kernel: - for testing: having repeatable behavior on native and compat deployments ensures that testing results are the same. This is the difference between having "undefined behavior" when the upper bits are set or "defined behavior: the process is terminated with sigsegv", - for security: if the behavior differs between 32-bit compat and native 32-bit, this leaks information about which specific architecture the kernel is running on, which facilitates attacks on the kernel. But perhaps I'm caring too much about those aspects ? Maybe they matter less than I presume. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html