Re: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description

"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> · Sun, 25 Jun 2023 18:48:36 +0000

On Fri, 2023-06-23 at 17:25 +0100, szabolcs.nagy@xxxxxxx wrote:
> why?
> 
> a stack can be active or inactive (task executing on it or not),
> and if inactive it can be live or dead (has stack frames that can
> be jumped to or not).
> 
> this is independent of shadow stacks: longjmp is only valid if the
> target is either the same active stack or an inactive live stack.
> (there are cases that may seem to work, but fundamentally broken
> and not supportable: e.g. two tasks executing on the same stack
> where the one above happens to not clobber frames deep enough to
> collide with the task below.)
> 
> the proposed longjmp design works for both cases. no assumption is
> made about ucontext or signals other than the shadow stack for an
> inactive live stack ends in a restore token, 

One of the problems for the case of longjmp() from another another
stack, is how to find the old stack's token. HJ and I had previously
discussed searching for the token from the target SSP forward, but the
problems are it is 1, not gauranteed to be there and 2, pretty awkward
and potentially slow.

> which is guaranteed by
> the isa so we only need the kernel to do the same when it switches
> shadow stacks. then longjmp works by construction.

No it's not.

> 
> the only wart is that an overflowed shadow stack is inactive dead
> instead of inactive live because the token cannot be added. (note
> that handling shstk overflow and avoiding shstk overflow during
> signal handling still works with alt shstk!)

I thought we were on the same page as far as pushing a restore token on
signal not being robust against shadow stack overflow, so is this a new
idea? Usually people around here say "code talks", but all I'm asking
for is a full explanation of what you are trying to accomplish and what
the idea is. Otherwise this is asking to hold up this feature based on
hand waving.

Could you answer the questions here, along with a full description of
your proposal:
https://lore.kernel.org/all/1cd67ae45fc379fd82d2745190e4caf74e67499e.camel@xxxxxxxxx/

It we are talking past each other, it could help to do a levelset at
this point.

> 
> an alternative solution is to allow jump to inactive dead stack
> if that's created by a signal interrupt. for that a syscall is
> needed and longjmp has to detect if the target stack is dead or
> live. (the kernel also has to be able to tell if switching to the
> dead stack is valid for security reasons.) i don't know if this
> is doable (if we allow some hacks it's doable).
> 
> unwinding across signal handlers is just a matter of having
> enough information at the signal frame to continue, it does
> not have to follow crazy jumps or weird coroutine things:
> that does not work without shadow stacks either. but unwind
> across alt stack frame should work.

Even if there is some workable idea, there is already a bunch of 
userspace built around the existing solution, and users waiting for it.

In addition the whole discussion around alt shadow stack cases will
require alt shadow stacks to be implemented and we might be constrained
there anyway.

If we want to take learnings and do something new, let's build it
around a new elf bit. This current kernel ABI is to support the old elf
bit userspace stack, which has been solidified by the existing upstream
userspace.

I had thought we should start from scratch now and proposed a patch to
block the old elf bit to force this, but lost that battle with other
glibc developers.

Speaking of which, I don't see any enthusiasm from any other glibc
developers that have been involved in this previously. Who were you
thinking was going to implement any of this on the glibc side? Have you
made any progress in getting any of them onboard with this order of
operations in the months since you first brought up changing the
design?