Re: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description

"szabolcs.nagy@xxxxxxx" <szabolcs.nagy@xxxxxxx> · Mon, 3 Jul 2023 19:19:00 +0100

The 07/02/2023 18:03, Edgecombe, Rick P wrote:
> On Thu, 2023-06-29 at 17:07 +0100, szabolcs.nagy@xxxxxxx wrote:
> > The 06/22/2023 23:18, Edgecombe, Rick P wrote:
> > > I'd also appreciate if you could spell out exactly which:
> > >  - ucontext
> > >  - signal
> > >  - longjmp
> > >  - custom library stack switching
> > > 
> > > patterns you think shadow stack should support working together.
> > > Because even after all these mails, I'm still not sure exactly what
> > > you
> > > are trying to achieve.
> 
> Hi Szablocs,
> 
> Thanks for writing all this up. It is helpful to understand where you
> are coming from. Please don't miss my point at the very bottom of this
> response.
> 
> > 
> > i'm trying to support two operations (in any combination):
> > 
> > (1) jump up the current (active) stack.
> > 
> > (2) jump to a live frame in a different inactive but live stack.
> >     the old stack becomes inactive (= no task executes on it)
> >     and live (= has valid frames to jump to).
> > 
> > with
> > 
> > (3) the runtime must manage the shadow stacks transparently.
> >     (= portable c code does not need modifications)
> > 
> > mapping this to c apis:
> > 
> > - swapcontext, setcontext, longjmp, custom stack switching are jump
> >   operations. (there are conditions under which (1) and (2) must
> > work,
> >   further details don't matter.)
> > 
> > - makecontext creates an inactive live stack.
> > 
> > - signal is only special if it executes on an alt stack: on signal
> >   entry the alt stack becomes active and the interrupted stack
> >   inactive but live. (nested signals execute on the alt stack until
> >   that is left either via a jump or signal return.)
> > 
> > - unwinding can be implemented with jump operations (it needs some
> >   other things but that's out of scope here).
> > 
> > the patterns that shadow stack should support falls out of this
> > model.
> > (e.g. posix does not allow jumping from one thread to the stack of a
> > different thread, but the model does not care about that, it only
> > cares if the target stack is inactive and live then jump should
> > work.)
> > 
> > some observations:
> > 
> > - it is necessary for jump to detect case (2) and then switch to the
> >   target shadow stack. this is also sufficient to implement it.
> > (note:
> >   the restore token can be used for detection since that is
> > guaranteed
> >   to be present when user code creates an inactive live stack and is
> >   not present anywhere else by design. a different marking can be
> > used
> >   if the inactive live stack is created by the kernel, but then the
> >   kernel has to provide a switch method, e.g. syscall. this should
> > not
> >   be controversial.)
> 
> For x86's shadow stack you can jump to a new stack without leaving a
> token behind. I don't know if maybe we could make it a rule in the
> x86_64 ABI that you should always leave a token if you are going to
> mark the SHSTK elf bit. But if anything did this, then longjmp() could
> never make it back to the stack where setjmp() was called without
> kernel help.

ok. i didn't know that.

the restore token means the stack is valid to execute on later. so
from libc pov if the token is missing then jumping to that stack is
simply ub. (the kernel could help working around missing tokens but
presumably the point of not adding a token is exactly to prevent
any kind of jumps to that stack later).

so i don't think this changes much. (other than the x86 model has
inactive dead stacks that cannot be jumped to, and this is enforced.
but libc jumps are not supposed to leave such dead stacks behind so
libc jumps would have to add the restore token when doing a switch.)

> > - in this model two live stacks cannot use the same shadow stack
> > since
> >   jumping between the two stacks is allowed in both directions, but
> >   jumping within a shadow stack only works in one direction. (also
> > two
> >   tasks could execute on the same shadow stack then. and it makes
> >   shadow stack size accounting problematic.)
> > 
> > - so sharing shadow stack with alt stack is broken. (the model is
> >   right in the sense that valid posix code can trigger the issue.
> 
> Could you spell out what "the issue" is that can be triggered?

i meant jumping back from the main to the alt stack:

in main:
setup sig alt stack
setjmp buf1
	raise signal on first return
	longjmp buf2 on second return

in signal handler:
setjmp buf2
	longjmp buf1 on first return
	can continue after second return

in my reading of posix this is valid (and works if signals are masked
such that the alt stack is not clobbered when jumping away from it).

but cannot work with a single shared shadow stack.

> >  we
> >   can ignore that corner case and adjust the model so the shared
> >   shadow stack works for alt stack, but it likely does not change the
> >   jump design: eventually we want alt shadow stack.)
> 
> As we discussed previously, alt shadow stack can't work transparently
> with existing code due to the sigaltstack API. I wonder if maybe you
> are trying to get at something else, and I'm not following.

i would like a jump design that works with alt shadow stack.

...
> So since they are tied together, and I thought to hold off on it for
> now. I don't want to try to squeeze around the upstream userspace, I
> think a version 2 should be a clean slate on a new elf bit.

ok.

> > i understand that the proposed linux abi makes most existing binaries
> > with shstk marking work, which is relevant for x86.
> > 
> > for a while i thought we can fix the remaining issues even if that
> > means breaking existing shstk binaries (just bump the abi marking).
> > now it seems the issues can only be addressed in a future abi break.
> 
> Adding a new arch_prctl() ENABLE value was the plan. Not sure what you
> mean by ABI break vs version bump. The plan was to add the new features
> without userspace regression by putting any behavior behind a different
> enable option. This relies on userspace to add a new elf bit, and to
> use it.

yes future abi break was the wrong wording: new abi with new elf bit
is what i meant. i.e. x86 will have an abi v1 and abi v2, where v2
abi is not compatible e.g. with v1 unwinder.

> > which means x86 linux will likely end up maintaining two incompatible
> > abis and the future one will need user code and build system changes,
> > not just runtime changes. it is not a small incremental change to add
> > alt shadow stack support for example.
> > 
> > i don't think the maintenance burden of two shadow stack abis is the
> > right path for arm64 to follow, so the shadow stack semantics will
> > likely become divergent not common across targets.
> 
> Unfortunately we are at a bit of an information asymmetry here because
> the ARM spec and patches are not public. It may be part of the cause of
> the confusion.

yes that's unfortunate. but in this case i just meant that arm64
does not have existing marked binaries to worry about. so it seems
wrong to do a v1 abi and later a v2 abi to fix that up.

> It kind of sounds like you don't like the x86 glibc implementation. And
> you want to make sure the kernel can support whatever a new solution is
> that you are working on. I am on board with the goal of having some
> generic set of rules to make portable code work for other architectures
> shadow stacks. But I think how close we can get to that goal or what it
> looks like is an open question. For several reasons:
>  1. Not everyone can see all the specs
>  2. No POCs have been done (or at least shared)
>  3. It's not clear what needs to be supported (yes, I know you have 
>     made a rough proposal here, but it sounds like on the x86 glibc 
>     side at least it's not even clear what non-shadow stack stack 
>     switching operations can work together)
> 
> But towards these goals, I think your technical requests are:
> 
> 1. Leave a token on switching to an alt shadow stack. As discussed
> earlier, we can't do this because of the overflow issues. Also since,

i don't think the overflow discussion was conclusive.

the kernel could modify the top entry instead of adding a new token.
and provide a syscall to switch to that top entry undoing the
modification.

there might be cornercases and likely not much space for encoding
a return address and special marking in an entry. but otherwise
this makes jumping to an overflowed shadow stack work.

but it is also a valid design to just not support jumping out of alt
stack in the specific case of a shadow stack overflow (treat that as
a fatal error), but still allow the jump in less fatal situations.

> alt shadow stack cannot be transparent to existing software anyway, it

maybe not in glibc, but a libc can internally use alt shadow stack
in sigaltstack instead of exposing a separate sigaltshadowstack api.
(this is what a strict posix conform implementation has to do to
support shadow stacks), leaking shadow stacks is not a correctness
issue unless it prevents the program working (the shadow stack for
the main thread likely wastes more memory than all the alt stack
leaks. if the leaks become dominant in a thread the sigaltstack
libc api can just fail).

> should be ok to introduce limitations. So I think this one is a no.
> What we could do is introduce security weakening kernel helpers, but
> this would make sense to come with alt shadow stack support.

yes this would be for abi v2 on x86.

> 2. Add an end token at the top of the shadow stack. Because of the
> existing userspace restriction interactions, this is complicated to
> evaluate but I think we *could* do this now. There are pros and cons.

yes, this is a minor point. (and ok to do differently across targets)

> 3. Support more options for shadow stack sizing. (I think you are
> referring to this conversation:
> https://lore.kernel.org/lkml/ZAIgrXQ4670gxlE4@xxxxxxx/). I don't see
> why this is needed for the base implementation. If ARM wants to add a
> new rlimit or clone variant, I don't see why x86 can't support it
> later.

i think it can be added later.

but it may be important for deployment on some platforms, since a
libc (or other language runtime) may want to set the shadow stack
size differently than the kernel default, because

- languages allocating large arrays on the stack
  (too big shadow stack can cause OOM with overcommit off and
  rlimits can be hit like RLIMIT_DATA, RLIMIT_AS because of it)

- tiny thread stack but big sigaltstack (musl libc, go).

> So if we add 2, are you satisfied? Or otherwise, on the non-technical
> request side, are you asking to hold off on x86 shadow stack, in order
> to co-develop a unified solution?

well a unified solution would need a v2 abi with new elf bit,
and it seems the preference is a v1 abi first, so at this point
i'm just trying to understand the potential brokenness in v1
and possible solutions for v2 since if there is a v2 i would
like that to be compatible across targets.

> I think the existing solution will see use in the meantime, including
> for the development of all the x86 specific JIT implementations.

i think some of that work have to be redone if there is a v2 abi,
which is why i thought having 2 abis is too much maintenance work.

> And finally, what I think is the most important point in all of this:
> 
> I think that *how* it gets used will be a better guide for further
> development than us debating. For example the main pain point that has
> come up so far is the problems around dlopen(). And the future work
> that has been scoped has been for the kernel to help out in this area.
> This is based on _user_ (distro) requests.

i think dlopen (and how it is used) is part of the api/abi design.

in presence of programs that can load any library (e.g. python exe)
it is difficult to make use of shstk without runtime disable.

however if dlopen gets support for runtime disable, reducing the
number of incompatible libraries over time is still relevant,
which requires abi/api design that allows that.

> Any apps that don't work with shadow stack limitations can simply not
> enable shadow stack. You and me are debating these specific API
> combinations, but we can't know whether they are actually the best
> place to focus development efforts. And the early signs are this is NOT
> the most important problem to solve.

disabling shadow stack is not simple (removing the elf marking from
an exe is often not possible/appropriate, but using an env var that
affects an entire process tree has problems too.) but i don't have
a solution for this (it is likely a userspace issue).

debating the api issues was at least useful for me to understand
what can go into the x86 v1 abi and what may go into a v2 abi.