Re: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description

Andy Lutomirski <luto@xxxxxxxxxx> · Thu, 22 Jun 2023 08:26:17 -0700

On Thu, Jun 22, 2023 at 2:28 AM szabolcs.nagy@xxxxxxx
<szabolcs.nagy@xxxxxxx> wrote:
>
> The 06/21/2023 18:54, Edgecombe, Rick P wrote:
> > On Wed, 2023-06-21 at 12:36 +0100, szabolcs.nagy@xxxxxxx wrote:
> > > > The 06/20/2023 19:34, Edgecombe, Rick P wrote:
> > > > > > I actually did a POC for this, but rejected it. The problem is,
> > > > > > if
> > > > > > there is a shadow stack overflow at that point then the kernel
> > > > > > > > can't
> > > > > > push the shadow stack token to the old stack. And shadow stack
> > > > > > > > overflow
> > > > > > is exactly the alt shadow stack use case. So it doesn't really
> > > > > > > > solve
> > > > > > the problem.
> > > >
> > > > the restore token in the alt shstk case does not regress anything
> > > > but
> > > > makes some use-cases work.
> > > >
> > > > alt shadow stack is important if code tries to jump in and out of
> > > > signal handlers (dosemu does this with swapcontext) and for that a
> > > > restore token is needed.
> > > >
> > > > alt shadow stack is important if the original shstk did not
> > > > overflow
> > > > but the signal handler would overflow it (small thread stack, huge
> > > > sigaltstack case).
> > > >
> > > > alt shadow stack is also important for crash reporting on shstk
> > > > overflow even if longjmp does not work then. longjmp to a
> > > > makecontext
> > > > stack would still work and longjmp back to the original stack can
> > > > be
> > > > made to mostly work by an altshstk option to overwrite the top
> > > > entry
> > > > with a restore token on overflow (this can break unwinding though).
> > > >
> >
> > There was previously a request to create an alt shadow stack for the
> > purpose of handling shadow stack overflow. So you are now suggesting to
> > to exclude that and instead target a different use case for alt shadow
> > stack?
>
> that is not what i said.
>
> > But I'm not sure how much we should change the ABI at this point since
> > we are constrained by existing userspace. If you read the history, we
> > may end up needing to deprecate the whole elf bit for this and other
> > reasons.
>
> i'm not against deprecating the elf bit, but i think binary
> marking will be difficult for this kind of feature no matter what
> (code may be incompatible for complex runtime dependent reasons).
>
> > So should we struggle to find a way to grow the existing ABI without
> > disturbing the existing userspace? Or should we start with something,
> > finally, and see where we need to grow and maybe get a chance at a
> > fresh start to grow it?
> >
> > Like, maybe 3 people will show up saying "hey, I *really* need to use
> > shadow stack and longjmp from a ucontext stack", and no one says
> > anything about shadow stack overflow. Then we know what to do. And
> > maybe dosemu decides it doesn't need to implement shadow stack (highly
> > likely I would think). Now that I think about it, AFAIU SS_AUTODISARM
> > was created for dosemu, and the alt shadow stack patch adopted this
> > behavior. So it's speculation that there is even a problem in that
> > scenario.
> >
> > Or maybe people just enable WRSS for longjmp() and directly jump back
> > to the setjmp() point. Do most people want fast setjmp/longjmp() at the
> > cost of a little security?
> >
> > Even if, with enough discussion, we could optimize for all
> > hypotheticals without real user feedback, I don't see how it helps
> > users to hold shadow stack. So I think we should move forward with the
> > current ABI.
>
> you may not get a second chance to fix a security feature.
> it will be just disabled if it causes problems.

*I* would use altshadowstack.

I run a production system (that cares about correctness *and*
performance, but that's not really relevant here -- SHSTK ought to be
fast).  And, if it crashes, I want to know why.  So I handle SIGSEGV,
etc so I have good logs if it crashes.  And I want those same logs if
I overflow the stack.

That being said, I have no need for longjmp or siglongjmp for this.  I
use exit(2) to escape.

For what it's worth, setjmp/longjmp is a bad API.  The actual pattern
that ought to work well (and that could be supported well by fancy
compilers and non-C languages, as I understand it) is more like a
function call that has two ways out.  Like this (pseudo-C):

void function(struct better_jmp_buf &buf, args...)
{
   ...
       if (condition)
          better_long_jump(buf);  // long jumps out!
       // could also pass buf to another function
   ...
       // could also return normally
}

better_call_with_jmp_buf(function, args);

*This* could support altshadowstack just fine.  And many users might
be okay with the understanding that, if altshadowstack is on, you have
to use a better long jump to get out (or a normal sigreturn or _exit).
No one is getting an altshadowstack signal handler without code
changes.

siglongjmp() could support altshadowstack with help from the kernel,
but we probably don't want to go there.

--Andy