On Thu, 2023-03-02 at 16:34 +0000, szabolcs.nagy@xxxxxxx wrote: > > Alternatively, the thread shadow stacks could get an already used > > token > > pushed at the end, to try to match what an in-use map_shadow_stack > > shadow stack would look like. Then the backtracing algorithm could > > just > > look for the same token in both cases. It might get confused in > > exotic > > cases and mistake a token in the middle of the stack for the end of > > the > > allocation though. Hmm... > > a backtracer would search for an end token on an active shadow > stack. it should be able to skip other tokens that don't seem > to be code addresses. the end token needs to be identifiable > and not break security properties. i think it's enough if the > backtrace is best effort correct, there can be corner-cases when > shadow stack is difficult to interpret, but e.g. a profiler can > still make good use of this feature. So just taking a look at this and remembering we used to have an arch_prctl() that returned the thread's shadow stack base and size. Glibc needed it, but we found a way around and dropped it. If we added something like that back, then it could be used for backtracing in the typical thread case and also potentially similar things to what glibc was doing. This also saves ~8 bytes per shadow stack over an end-of- stack marker, so it's a tiny bit better on memory use. For the end-of-stack-marker solution: In the case of thread shadow stacks, I'm not seeing any issues testing adding markers at the end. So adding this on top of the existing series for just thread shadow stacks seems lower probability of impact regression wise. Especially if we do it in the near term. For ucontext/map_shadow_stack, glibc expects a token to be at the size passed in. So we would either have to create a larger allocation (to include the marker) or create a new map_shadow_stack flag to do this (it was expected that there might be new types of initial shadow stack data that the kernel might need to create). It is also possible to pass a non-page aligned size and get zero's at the end of the allocation. In fact glibc does this today in the common case. So that is also an option. I think I slightly prefer the former arch_prctl() based solution for a few reasons: - When you need to find the start or end of the shadow stack can you can just ask for it instead of searching. It can be faster and simpler. - It saves 8 bytes of memory per shadow stack. If this turns out to be wrong and we want to do the marker solution much later at some point, the safest option would probably be to create new flags. But just discussing this with HJ, can you share more on what the usage is? Like which backtracing operation specifically needs the marker? How much does it care about the ucontext case?