Re: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description

"szabolcs.nagy@xxxxxxx" <szabolcs.nagy@xxxxxxx> · Fri, 7 Jul 2023 16:25:08 +0100

The 07/06/2023 18:25, Edgecombe, Rick P wrote:
> On Thu, 2023-07-06 at 14:07 +0100, szabolcs.nagy@xxxxxxx wrote:
> 
> [ snip ]
> > 
> > instead of priority, i'd say "posix conform c apps work
> > without change" is a benchmark i use to see if the design
> > is sound.
> 
> This involves leaking shadow stacks for sigaltstack and makecontext,
> though, right? This seems kind of wrong. It might be useful for
> avoiding crashes at all costs, but probably shouldn't be the long term
> solution. I thought your API updates were the right direction.

new apis are not enough.

existing apis either must do something reasonable or shstk must
be disabled when that api is used (sigaltstack, makecontext).

the disable does not really work as the apis are widely used
and there is no 'disable shstk locally' it is viral when a
widely used dependency is affected.

so apis must do something reasonable. however there will be
remaining issues and that will need new apis which can take
a long time to transition to.

> > one nasty case is shadow stack overflow handling, but i
> > think i have a solution for that (not the nicest thing:
> > it involves setting the top bit on the last entry on the
> > shadow stack instead of adding a new entry to it. + a new
> > syscall that can switch to this entry. i havent convinced
> > myself about this yet).
> 
> There might be some complicated thing around storing the last shadow
> stack entry into the shadow stack sigframe and restoring it on
> sigreturn. Then writing a token from the kernel to where the saved
> frame was to live there in the meantime.

this only works if you jump from the alt stack to the overflowed
stack, not if you jump somewhere else in between.

this is a would be nice to solve but not the most important case.

> 
> But to me this whole search, restore and INCSSP thing is suspect at
> this point though. We could also increase compatibility and performance
> more simply, by adding kernel help, at the expense of security.

what is the kernel help? (and security trade-off)

and why is scan+restore+incssp suspect?

the reasoning i've seen are

- some ppl might not add restore token: sounds fine, it's already
  ub to jump to such stack either way.

- it is slow: please give an example where there is slowdown.
  (the slowdown has to be larger than the call/ret overhead)

- jump to overflowed shadow stack: i'm fairly sure this can be done
  (but indeed complicated), and if that's not acceptable then not
  supporting this case is better than not supporting reliable
  crash handling (alt stack handler can overflow the shadow stack
  and shadow stack overflow cannot be handled).

> > slow longjmp is bad. (well longjmp is actually always slow
> > in glibc because it sets the signalmask with a syscall, but
> > there are other jump operations that don't do this and want
> > to be fast so yes we want fast jump to be possible).
> > 
> > jumping up the shadow stack is at least linear time in the
> > number of frames jumped over (which already sounds significant
> > slowdown however this is amortized by the fact that the stack
> > frames had to be created at some point and that is actually a
> > lot more expensive because it involves write operations, so a
> > zero cost jump will not do any asymptotic speedup compared to
> > a linear cost jump as far as i can see.).
> > 
> > with my proposed solution the jump is still linear. (i know
> > x86 incssp can jump many entries at a time and does not have
> > to actually read and check the entries, but technically it's
> > linear time too: you have to do at least one read per page to
> > have the guardpage protection). this all looks fine to me
> > even for extreme made up workloads.
> 
> Well I guess we are talking about hypothetical performance. But linear
> time is still worse than O(1). And I thought longjmp() was supposed to
> be an O(1) type thing.

longjmp is not O(1) with your proposed abi.

and i don't think linear time is worse than O(1) in this case.

> Separate from all of this...now that all the constraints are clearer,
> if you have changed your mind on whether this series is ready, could
> you comment at the top of this thread something to that effect? I'm
> imagining not many are reading so far down at this point.
> 
> For my part, I think we should go forward with what we have on the
> kernel side, unless glibc/gcc developers would like to start by
> deprecating the existing binaries. I just talked with HJ, and he has
> not changed his plans around this. If anyone else in that community has
> (Florian?), please speak up. But otherwise I think it's better to start
> getting real world feedback and grow based on that.
> 

the x86 v1 abi tries to be compatible with existing unwinders.
(are there other binaries that constrains v1? portable code
should be fine as they rely on libc which we can still change)

i will have to discuss the arm plan with the arm kernel devs.
the ugly bit i want to avoid on arm is to have to reimplement
unwind and jump ops to make alt shadow stack work in a v2 abi.

i think the worse bit of the x86 v1 abi is that crash handlers
don't work reliably (e.g. a crash on a tiny makecontext stack
with the usual sigaltstack crash handler can unrecoverably fail
during crash handling). i guess this can be somewhat mitigated
by both linux and libc adding an extra page to the shadow stack
size to guarantee that alt stack handlers with certain depth
always work.