Re: [PATCH RFT v4 0/5] fork: Support shadow stacks in clone3()

"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> · Thu, 30 Nov 2023 23:37:42 +0000

On Thu, 2023-11-30 at 21:51 +0000, Mark Brown wrote:
> On Thu, Nov 30, 2023 at 07:00:58PM +0000, Catalin Marinas wrote:
> 
> > My hope when looking at the arm64 patches was that we can
> > completely
> > avoid the kernel allocation/deallocation of the shadow stack since
> > it
> > doesn't need to do this for the normal stack either. Could someone
> > please summarise why we dropped the shadow stack pointer after v1?
> > IIUC
> > there was a potential security argument but I don't think it was a
> > very
> > strong one. Also what's the threat model for this feature? I
> > thought
> > it's mainly mitigating stack corruption. If some rogue code can do
> > syscalls, we have bigger problems than clone3() taking a shadow
> > stack
> > pointer.
> 
> As well as preventing/detecting corruption of the in memory stack
> shadow
> stacks are also ensuring that any return instructions are unwinding a
> prior call instruction, and that the returns are done in opposite
> order
> to the calls.  This forces usage of the stack - any value we attempt
> to
> RET to is going to be checked against the top of the shadow stack
> which
> makes chaining returns together as a substitute for branches harder.
> 
> The concern Rick raised was that allowing user to pick the exact
> shadow
> stack pointer would allow userspace to corrupt or reuse the stack of
> an
> existing thread by starting a new thread with the shadow stack
> pointing
> into the existing shadow stack of that thread.  While in isolation
> that's not too much more than what userspace could just do directly
> anyway it might compose with other issues to something more
> "interesting"
> (eg, I'd be a bit concerned about overlap with pkeys/POE though I've
> not
> thought through potential uses in detail).

I think it is open for userspace customization. The kernel tries to
leave the option to lock things down as much as it can (partly because
it's not clear how all the userspace tradeoffs will shake out).

In the past, we had talked about allowing a set SSP (GCSPR) prctl() to
help with some of the compatibility gaps (longjmp() between stacks,
etc). If we loosened things up a bit this could help there, but it kind
of defeats the purpose a little, of the token checking stuff built into
these features at the HW level. A super-stack-canary mode might be nice
for people who just want to flip a switch on existing apps without
checking them, or people who want to do tracing and don't care about
security. But, I also wouldn't be surprised if some high security
applications decide to block map_shadow_stack all together to lock
threads to their own shadow stacks.

So I kind of like leaning towards leaving the option to lock things
down more when we can. Like Mark was getting at, we don't know all the
ways shadow stacks will get attacked yet. So turning it around, why not
let the shadow stack get allocated by the kernel? It makes the kernel
code/complexity smaller, are there any other benefits?

> 
> > I'm not against clone3() getting a shadow_stack_size argument but
> > asking
> > some more questions. If we won't pass a pointer as well, is there
> > any
> > advantage in expanding this syscall vs a specific prctl() option?
> > Do we
> > need a different size per thread or do all threads have the same
> > shadow
> > stack size? A new RLIMIT doesn't seem to map well though, it is
> > more
> > like an upper limit rather than a fixed/default size (glibc I think
> > uses
> > it for thread stacks but bionic or musl don't AFAIK).
> 
> I don't know what the userspace patterns are likely to be here, it's
> possible a single value for each process might be fine but I couldn't
> say that confidently.  I agree that a RLIMIT does seem like a poor
> fit.
> 
> As well as the actual configuration of the size the other thing that
> we
> gain is that as well as relying on heuristics to determine if we need
> to
> allocate a new shadow stack for the new thread we allow userspace to
> explicitly request a new shadow stack.  There was some corner case
> with
> IIRC posix_nspawn() mentioned where the heuristics aren't what we
> want
> for example.

Can't posix_spawn() pass in a shadow stack size into clone3 to get a
new shadow stack after this series?

> 
> > Another dumb question on arm64 - is GCSPR_EL0 writeable by the
> > user? If
> > yes, can the libc wrapper for threads allocate a shadow stack via
> > map_shadow_stack() and set it up in the thread initialisation
> > handler
> > before invoking the thread function?
> 
> No, GCSPR_EL0 can only be changed by EL0 through BL, RET and the
> new GCS instructions (push/pop and stack switch).  Push is optional -
> userspace has to explicitly request that it be enabled and this could
> be
> prevented through seccomp or some other LSM.  The stack switch
> instructions require a token at the destination address which must
> either be written by a higher EL or will be written in the process of
> switching away from a stack so you can switch back.  Unless I've
> missed
> one every mechanism for userspace to update GCSPR_EL0 will do a GCS
> memory access so providing guard pages have been allocated wrapping
> to a
> different stack will be prevented.
> 
> We would need a syscall to allow GCSPR_EL0 to be written.

I think the problem with doing this is signals. If a signal is
delivered to the new thread, then it could push to the old shadow stack
before userspace gets a chance to switch. So the thread needs to start
on a new shadow/stack.