Re: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description

"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> · Mon, 10 Jul 2023 22:56:51 +0000

On Mon, 2023-07-10 at 17:54 +0100, szabolcs.nagy@xxxxxxx wrote:
> > Some mails back, I listed the three things you might be asking for
> > from
> > the kernel side and pointedly asked you to clarify. The only one
> > you
> > still were wishing for up front was "Leave a token on switching to
> > an
> > alt shadow stack."
> > 
> > But how you want to use this involves a lot of details for how
> > glibc
> > will work (automatic shadow stack for sigaltstack, scan-restore-
> > incssp,
> > etc). I think you first need to get the story straight with other
> > libc
> > developers, otherwise this is just brainstorming. I'm not a glibc
> > contributor, so winning me over is only half the battle.
> > 
> > Only after that is settled do we get to the problem of the old
> > libgcc
> > unwinders, and how it is a challenge to even add alt shadow stack
> > given
> > glibc's plans and the existing binaries.
> > 
> > Once that is solved we are at the overflow problem, and the current
> > state of thinking on that is "i'm fairly sure this can be done (but
> > indeed complicated)".
> > 
> > So I think we are still missing any actionable requests that should
> > hold this up.
> > 
> > Is this a reasonable summary?
> 
> not entirely.
> 
> the high level requirement is a design that
> 
> a) does not break many existing sigaltstack uses,
> 
> b) allows implementing jump and unwind that support the
>    relevant use-cases around signals and stack switches
>    with minimal userspace changes.

Please open a discussion with the other glibc developers that have been
involved with shadow stack regarding this subject(b). Please include me
(and probably AndyL would be interested?). I think we've talked it
through as much as you and I can at this point. Let's at least start a
new more focused thread on the "unwind across stacks" problem. And also
get some consensus on the wisdom of the related suggestion to leak
shadow stacks in order to transparently support existing posix APIs.

> 
> where (b) has nothing to add to v1 abi: existing unwind
> binaries mean this needs a v2 abi. (the point of discussing
> v2 ahead of time is to understand the cost of v2 and the
> divergence wrt targets without abi compat issue.)
> 
> for (a) my actionable suggestion was to account altstack
> when sizing shadow stacks. to document an altstack call
> depth limit on the libc level (e.g. fixed 100 is fine) we
> need guarantees from the kernel. (consider recursive calls
> overflowing the stack with altstack crash handler: for this
> to be reliable shadow stack size > stack size is needed.
> but the diff can be tiny e.g. 1 page is enough.)
> 
> your previous 3 actionable item list was
> 
> 1. add token when handling signals on altstack.
> 
> this falls under (b). your summary is correct that this
> requires sorting out many fiddly details.
> 
> 2. top of stack token.
> 
> this can work differently across targets so i have nothing
> against the x86 v1 abi, but on arm64 we plan to have this.
> 
> 3. more shadow stack sizing policies.
> 
> this can be done in the future except the default policy
> should be fixed for (a) and a smaller size introduces the
> overflow issue which may require v2.
> 
> in short the only important change for v1 is shstk sizing.

I tried searching through this long thread and AFAICT this is a new
idea. Sorry if I missed something, but your previous answer on this(3)
seemed concerned with the opposite problem (oversized shadow stacks).

Quoted from a past mail:
On Mon, 2023-07-03 at 19:19 +0100, szabolcs.nagy@xxxxxxx wrote:
> i think it can be added later.
> 
> but it may be important for deployment on some platforms, since a
> libc (or other language runtime) may want to set the shadow stack
> size differently than the kernel default, because
> 
> - languages allocating large arrays on the stack
>   (too big shadow stack can cause OOM with overcommit off and
>   rlimits can be hit like RLIMIT_DATA, RLIMIT_AS because of it)
> 
> - tiny thread stack but big sigaltstack (musl libc, go).

So you can probably see how I got the impression that 3 was closed.

But anyways, ok, so if we add a page to every thread allocated shadow
stack, then you can guarantee that an alt stack can have some room to
handle at least a single alt stack signal, even in the case of
exhausting the entire stack by recursively making calls and pushing
nothing else to the stack. SS_AUTODISARM remains a bit muddy.

Also glibc would have to size ucontext shadow stacks with an additional
page as well. I think it would be good to get some other signs of
interest on this tweak due to the requirements for glibc to participate
on the scheme. Can you gather that quickly, so we can get this all
prepped again?

To me (unless I'm missing something), it seems like complicating the
equation for probably no real world benefit due to the low chances of
exhausting a shadow stack. But if there is consensus on the glibc side,
then I'm happy to make the change to finally settle this discussion.