Re: [PATCH v7 01/41] Documentation/x86: Add CET shadow stack description

"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> · Fri, 3 Mar 2023 22:35:05 +0000

On Thu, 2023-03-02 at 16:34 +0000, szabolcs.nagy@xxxxxxx wrote:
> > Alternatively, the thread shadow stacks could get an already used
> > token
> > pushed at the end, to try to match what an in-use map_shadow_stack
> > shadow stack would look like. Then the backtracing algorithm could
> > just
> > look for the same token in both cases. It might get confused in
> > exotic
> > cases and mistake a token in the middle of the stack for the end of
> > the
> > allocation though. Hmm...
> 
> a backtracer would search for an end token on an active shadow
> stack. it should be able to skip other tokens that don't seem
> to be code addresses. the end token needs to be identifiable
> and not break security properties. i think it's enough if the
> backtrace is best effort correct, there can be corner-cases when
> shadow stack is difficult to interpret, but e.g. a profiler can
> still make good use of this feature.

So just taking a look at this and remembering we used to have an
arch_prctl() that returned the thread's shadow stack base and size.
Glibc needed it, but we found a way around and dropped it. If we added
something like that back, then it could be used for backtracing in the
typical thread case and also potentially similar things to what glibc
was doing. This also saves ~8 bytes per shadow stack over an end-of-
stack marker, so it's a tiny bit better on memory use.

For the end-of-stack-marker solution:
In the case of thread shadow stacks, I'm not seeing any issues testing
adding markers at the end. So adding this on top of the existing series
for just thread shadow stacks seems lower probability of impact
regression wise. Especially if we do it in the near term.

For ucontext/map_shadow_stack, glibc expects a token to be at the size
passed in. So we would either have to create a larger allocation (to
include the marker) or create a new map_shadow_stack flag to do this
(it was expected that there might be new types of initial shadow stack
data that the kernel might need to create). It is also possible to pass
a non-page aligned size and get zero's at the end of the allocation. In
fact glibc does this today in the common case. So that is also an
option.

I think I slightly prefer the former arch_prctl() based solution for a
few reasons:
 - When you need to find the start or end of the shadow stack can you
can just ask for it instead of searching. It can be faster and simpler.
 - It saves 8 bytes of memory per shadow stack.

If this turns out to be wrong and we want to do the marker solution
much later at some point, the safest option would probably be to create
new flags.

But just discussing this with HJ, can you share more on what the usage
is? Like which backtracing operation specifically needs the marker? How
much does it care about the ucontext case?