On Mon, 2023-07-03 at 19:19 +0100, szabolcs.nagy@xxxxxxx wrote: > Could you spell out what "the issue" is that can be triggered? > > i meant jumping back from the main to the alt stack: > > in main: > setup sig alt stack > setjmp buf1 > raise signal on first return > longjmp buf2 on second return > > in signal handler: > setjmp buf2 > longjmp buf1 on first return > can continue after second return > > in my reading of posix this is valid (and works if signals are masked > such that the alt stack is not clobbered when jumping away from it). > > but cannot work with a single shared shadow stack. Ah, I see. To make this work seamlessly, you would need to have automatic alt shadow stacks, and as we previously discussed this is not possible with the existing sigaltstack API. (Or at least it seemed like a closed discussion to me). If there is a solution, then we are currently missing a detailed proposal. It looks like further down you proposed leaking alt shadow stacks (quoted up here near the related discussion): On Mon, 2023-07-03 at 19:19 +0100, szabolcs.nagy@xxxxxxx wrote: > maybe not in glibc, but a libc can internally use alt shadow stack > in sigaltstack instead of exposing a separate sigaltshadowstack api. > (this is what a strict posix conform implementation has to do to > support shadow stacks), leaking shadow stacks is not a correctness > issue unless it prevents the program working (the shadow stack for > the main thread likely wastes more memory than all the alt stack > leaks. if the leaks become dominant in a thread the sigaltstack > libc api can just fail). It seems like your priority must be to make sure pure C apps don't have to make any changes in order to not crash with shadow stack enabled. And this at the expense of any performance and memory usage. Do you have some formalized priorities or design philosophy you can share? Earlier you suggested glibc should create new interfaces to handle makecontext() (makes sense). Shouldn't the same thing happen here? In which case we are in code-changes territory and we should ask ourselves what apps really need. > > > > we > > > can ignore that corner case and adjust the model so the shared > > > shadow stack works for alt stack, but it likely does not change > > > the > > > jump design: eventually we want alt shadow stack.) > > > > As we discussed previously, alt shadow stack can't work > > transparently > > with existing code due to the sigaltstack API. I wonder if maybe > > you > > are trying to get at something else, and I'm not following. > > i would like a jump design that works with alt shadow stack. A shadow stack switch could happen based on the following scenarios: 1. Alt shadow stack 2. ucontext 3. custom stack switching logic If we leave a token on signal, then 1 and 2 could be guaranteed to have a token *somewhere* above where setjmp() could have been called. The algorithm could be to search from the target SSP up the stack until it finds a token, and then switch to it and INCSSP back to the SSP of the setjmp() point. This is what we are talking about, right? And the two problems are: - Alt shadow stack overflow problem - In the case of (3) there might not be a token Let's ignore these problems for a second - now we have a solution that allows you to longjmp() back from an alt stack or ucontext stack. Or at least it works functionally. But is it going to actually work for people who are using longjmp() for things that are supposed to be fast? Like, is this the tradeoff people want? I see some references to fiber switching implementations using longjmp(). I wonder if the existing INCSSP loops are not going to be ideal for every usage already, and this sounds like going further down that road. For jumping out occasionally in some error case, it seems it would be useful. But I think we are then talking about targeting a subset of people using these stack switching patterns. Looking at the docs Mark linked (thanks!), ARM has generic GCS PUSH and POP shadow stack instructions? Can ARM just push a restore token at setjmp time, like I was trying to figure out earlier with a push token arch_prctl? It would be good to understand how ARM is going to implement this with these differences in what is allowed by the HW. If there are differences in how locked down/functional the hardware implementations are, and if we want to have some unified set of rules for apps, there will need to some give and take. The x86 approach was mostly to not support all behaviors and ask apps to either change or not enable shadow stacks. We don't want one architecture to have to do a bunch of strange things, but we also don't want one to lose some key end user value. I'm thinking that for pure tracing users, glibc might do things a lot differently (use of WRSS to speed things up). So I'm guessing we will end up with at least one more "policy" on the x86 side. I wonder if maybe we should have something like a "max compatibility" policy/mode where arm/x86/riscv could all behave the same from the glibc caller perspective. We could add kernel help to achieve this for any implementation that is more locked down. And maybe that is x86's v2 ABI. I don't know, just sort of thinking out loud at this point. And this sort of gets back to the point I keep making: if we need to decide tradeoffs, it would be great to get some users to start using this and start telling us what they want. Are people caring mostly about security, compatibility or performance? [snip]