Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Fri, 29 Jun 2018 15:48:58 -0400 (EDT)

----- On Jun 29, 2018, at 1:03 PM, Linus Torvalds torvalds@xxxxxxxxxxxxxxxxxxxx wrote:

> On Fri, Jun 29, 2018 at 9:07 AM Mathieu Desnoyers
> <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>>
>> This code is not invoked from syscalls, but rather on return from
>> interrupt/trap after a preemption.
> 
> But when we register the rseq, we could easily check that the top bits
> of the IP is clear, no?

When a thread registers rseq, it registers a pointer to a user-space
address where a struct rseq is located.

That struct rseq is typically in a TLS area. It contains a pointer
to the current "struct rseq_cs": the content of rseq_cs describes the
current rseq critical section.

So when we register rseq, the rseq->rseq_cs pointer value is typically
NULL, because there is no currently active critical section. It's after
return from sys_rseq registration that user-space eventually sets the
pointer to a non-NULL value when it enters a critical section.

So at rseq registration, there is no point in validating the value of
the rseq_cs pointer, nor of any fields in the struct rseq_cs that would
be currently pointed to by that rseq_cs pointer, because those all change
after registration.

> Sure, user space can change it after the fact, but at that point it's
> literally "user space is being intentionally stupid".

User-space can be either stupid, or really clever and trying to attack
the kernel.

> The real worry is that 32-bit compat code never initializes those bits
> at all, no?

There are two aspects I'm concerned about here:

1) security: we don't want 32-bit user-space to feed a 64-bit value over 4GB
   as abort_ip that may end up causing OOPSes on architectures that would
   lack proper validation of those values on return to userspace.

2) behavior consistency of 32-bit userspace on both native 32-bit and 32-bit
   compat on 64-bit kernel:
   - for testing: having repeatable behavior on native and compat deployments
     ensures that testing results are the same. This is the difference between
     having "undefined behavior" when the upper bits are set or "defined behavior:
     the process is terminated with sigsegv",
   - for security: if the behavior differs between 32-bit compat and native
     32-bit, this leaks information about which specific architecture the kernel
     is running on, which facilitates attacks on the kernel.

But perhaps I'm caring too much about those aspects ? Maybe they matter less
than I presume.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html