Re: [musl] Re: [PATCH v8 00/38] arm64/gcs: Provide support for GCS in userspace

"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> · Wed, 21 Feb 2024 00:35:48 +0000

On Tue, 2024-02-20 at 18:54 -0500, dalias@xxxxxxxx wrote:
> On Tue, Feb 20, 2024 at 11:30:22PM +0000, Edgecombe, Rick P wrote:
> > On Tue, 2024-02-20 at 13:57 -0500, Rich Felker wrote:
> > > On Tue, Feb 20, 2024 at 06:41:05PM +0000, Edgecombe, Rick P
> > > wrote:
> > > > Hmm, could the shadow stack underflow onto the real stack then?
> > > > Not
> > > > sure how bad that is. INCSSP (incrementing the SSP register on
> > > > x86)
> > > > loops are not rare so it seems like something that could
> > > > happen.
> > > 
> > > Shadow stack underflow should fault on attempt to access
> > > non-shadow-stack memory as shadow-stack, no?
> > 
> > Maybe I'm misunderstanding. I thought the proposal included
> > allowing
> > shadow stack access to convert normal address ranges to shadow
> > stack,
> > and normal writes to convert shadow stack to normal.
> 
> As I understood the original discussion of the proposal on IRC, it
> was
> only one-way (from shadow to normal). Unless I'm missing something,
> making it one-way is necessary to catch situations where the shadow
> stack would become compromised.

The original post here:
https://lore.kernel.org/lkml/22a53b78-10d7-4a5a-a01e-b2f3a8c22e94@xxxxxxxxxxxxxxxx/

...has:
"Shadow stack faults on non-shadow stack pages, if flexible shadow
stack handling is in effect, cause the affected page to become a shadow
stack page.  When this happens, the page filled with invalid address
tokens."

...and:
"Faults from non-shadow-stack accesses to a shadow-stack page which was
created by the previous paragraph will cause the page to revert to non-
shadow-stack usage, with or without clearing."

I see Stefan has clarified in another response. So I'll go try to
figure it out.

> 
> > > > Shadow stacks currently have automatic guard gaps to try to
> > > > prevent
> > > > one
> > > > thread from overflowing onto another thread's shadow stack.
> > > > This
> > > > would
> > > > somewhat opens that up, as the stack guard gaps are usually
> > > > maintained
> > > > by userspace for new threads. It would have to be thought
> > > > through
> > > > if
> > > > these could still be enforced with checking at additional
> > > > spots.
> > > 
> > > I would think the existing guard pages would already do that if a
> > > thread's shadow stack is contiguous with its own data stack.
> > 
> > The difference is that the kernel provides the guard gaps, where
> > this
> > would rely on userspace to do it. It is not a showstopper either.
> > 
> > I think my biggest question on this is how does it change the
> > capability for two threads to share a shadow stack. It might
> > require
> > some special rules around the syscall that writes restore tokens.
> > So
> > I'm not sure. It probably needs a POC.
> 
> Why would they be sharing a shadow stack?

The guard gap was introduced originally based on a suggestion that
overflowing a shadow stack onto an adjacent shadow stack could cause
corruption that could be used by an attacker to work around the
protection. There wasn't any concrete demonstrated attacks or
suggestion that all the protection was moot.

But when we talk about capabilities for converting memory to shadow
stack with simple memory accesses, and syscalls that can write restore
token to shadow stacks, it's not immediately clear to me that it
wouldn't open up something like that. Like if two restore tokens were
written to a shadow stack, or two shadow stacks were adjacent with
normal memory between them that later got converted to shadow stack.
Those sorts of scenarios, but I won't lean on those specific examples.
Sorry for being hand wavy. It's just where I'm at, at this point.

> 
> > > From the musl side, I have always looked at the entirely of
> > > shadow
> > > stack stuff with very heavy skepticism, and anything that breaks
> > > existing interface contracts, introduced places where apps can
> > > get
> > > auto-killed because a late resource allocation fails, or requires
> > > applications to code around the existence of something that
> > > should be
> > > an implementation detail, is a non-starter. To even consider
> > > shadow
> > > stack support, it must truely be fully non-breaking.
> > 
> > The manual assembly stack switching and JIT code in the apps needs
> > to
> > be updated. I don't think there is a way around it.
> 
> Indeed, I'm not talking about programs with JIT/manual stack-
> switching
> asm, just anything using existing APIs for control of stack --
> pthread_setstack, makecontext, sigaltstack, etc.

Then I think WRSS might fit your requirements better than what glibc
did. It was considered a reduced security mode that made libc's job
much easier and had better compatibility, but the last discussion was
to try to do it without WRSS.

> 
> > I agree though that the late allocation failures are not great.
> > Mark is
> > working on clone3 support which should allow moving the shadow
> > stack
> > allocation to happen in userspace with the normal stack. Even for
> > riscv
> > though, doesn't it need to update a new register in stack
> > switching?
> 
> If clone is called with signals masked, it's probably not necessary
> for the kernel to set the shadow stack register as part of clone3.

So you would want a mode of clone3 that basically leaves the shadow
stack bits alone? Mark was driving that effort, but it doesn't seem
horrible to me on first impression. If it would open up the possibility
of musl support.

> 
> > BTW, x86 shadow stack has a mode where the shadow stack is writable
> > with a special instruction (WRSS). It enables the SSP to be set
> > arbitrarily by writing restore tokens. We discussed this as an
> > option
> > to make the existing longjmp() and signal stuff work more
> > transparently
> > for glibc.
> > 
> > 
> > 
> > BTW, when I talk about "not supporting" I don't mean the app should
> > crash. I mean it should instead run normally, just without shadow
> > stack
> > enabled. Not sure if that was clear. Since shadow stack is not
> > essential for an application to function, it is only security
> > hardening
> > on top.
> > 
> > Although determining if an application supports shadow stack has
> > turned
> > out to be difficult in practice. Handling dlopen() is especially
> > hard.
> 
> One reasonable thing to do, that might be preferable to
> overengineered
> solutions, is to disable shadow-stack process-wide if an interface
> incompatible with it is used (sigaltstack, pthread_create with an
> attribute setup using pthread_attr_setstack, makecontext, etc.), as
> well as if an incompatible library is is dlopened.

I think it would be an interesting approach to determining
compatibility. On x86 there has been cases of binaries getting
mismarked as supporting shadow stack. So an automated way of filtering
some of those out would be very useful I think. I guess the dynamic
linker could determine this based on some list of functions?

The dlopen() bit gets complicated though. You need to disable shadow
stack for all threads, which presumably the kernel could be coaxed into
doing. But those threads might be using shadow stack instructions
(INCSSP, RSTORSSP, etc). These are a collection of instructions that
allow limited control of the SSP. When shadow stack gets disabled,
these suddenly turn into #UD generating instructions. So any other
threads executing those instructions when shadow stack got disabled
would be in for a nasty surprise.

Glibc's permissive mode (that disables shadow stack when dlopen()ing a
DSO that doesn't support shadow stack) is quite limited because of
this. There was a POC for working around it, but I'll stop there for
now, to not spam you with the details. I'm not sure of arm and risc-v
details on this specific corner, but for x86.

>  This is much more
> acceptable than continuing to run with shadow stacks managed sloppily
> by the kernel and async killing the process on OOM, and is probably
> *more compatible* with apps than changing the minimum stack size
> requirements out from under them.

Yep.

> 
> The place where it's really needed to be able to allocate the shadow
> stack synchronously under userspace control, in order to harden
> normal
> applications that aren't doing funny things, is in pthread_create
> without a caller-provided stack.

Yea most apps don't do anything too tricky. Mostly shadow stack "just
works". But it's no excuse to just crash for the others.