On Thu, 2023-06-29 at 17:07 +0100, szabolcs.nagy@xxxxxxx wrote: > The 06/22/2023 23:18, Edgecombe, Rick P wrote: > > I'd also appreciate if you could spell out exactly which: > > - ucontext > > - signal > > - longjmp > > - custom library stack switching > > > > patterns you think shadow stack should support working together. > > Because even after all these mails, I'm still not sure exactly what > > you > > are trying to achieve. Hi Szablocs, Thanks for writing all this up. It is helpful to understand where you are coming from. Please don't miss my point at the very bottom of this response. > > i'm trying to support two operations (in any combination): > > (1) jump up the current (active) stack. > > (2) jump to a live frame in a different inactive but live stack. > the old stack becomes inactive (= no task executes on it) > and live (= has valid frames to jump to). > > with > > (3) the runtime must manage the shadow stacks transparently. > (= portable c code does not need modifications) > > mapping this to c apis: > > - swapcontext, setcontext, longjmp, custom stack switching are jump > operations. (there are conditions under which (1) and (2) must > work, > further details don't matter.) > > - makecontext creates an inactive live stack. > > - signal is only special if it executes on an alt stack: on signal > entry the alt stack becomes active and the interrupted stack > inactive but live. (nested signals execute on the alt stack until > that is left either via a jump or signal return.) > > - unwinding can be implemented with jump operations (it needs some > other things but that's out of scope here). > > the patterns that shadow stack should support falls out of this > model. > (e.g. posix does not allow jumping from one thread to the stack of a > different thread, but the model does not care about that, it only > cares if the target stack is inactive and live then jump should > work.) > > some observations: > > - it is necessary for jump to detect case (2) and then switch to the > target shadow stack. this is also sufficient to implement it. > (note: > the restore token can be used for detection since that is > guaranteed > to be present when user code creates an inactive live stack and is > not present anywhere else by design. a different marking can be > used > if the inactive live stack is created by the kernel, but then the > kernel has to provide a switch method, e.g. syscall. this should > not > be controversial.) For x86's shadow stack you can jump to a new stack without leaving a token behind. I don't know if maybe we could make it a rule in the x86_64 ABI that you should always leave a token if you are going to mark the SHSTK elf bit. But if anything did this, then longjmp() could never make it back to the stack where setjmp() was called without kernel help. > > - in this model two live stacks cannot use the same shadow stack > since > jumping between the two stacks is allowed in both directions, but > jumping within a shadow stack only works in one direction. (also > two > tasks could execute on the same shadow stack then. and it makes > shadow stack size accounting problematic.) > > - so sharing shadow stack with alt stack is broken. (the model is > right in the sense that valid posix code can trigger the issue. Could you spell out what "the issue" is that can be triggered? > we > can ignore that corner case and adjust the model so the shared > shadow stack works for alt stack, but it likely does not change the > jump design: eventually we want alt shadow stack.) As we discussed previously, alt shadow stack can't work transparently with existing code due to the sigaltstack API. I wonder if maybe you are trying to get at something else, and I'm not following. > > - shadow stack cannot always be managed by the runtime transparently: > it has to be allocated for makecontext and alt stack in situations > where allocation failure cannot be handled. more alarmingly the > destruction of stacks may not be visible to the runtime so the > corresponding shadow stacks leak. my preferred way to fix this is > new apis that are shadow stack compatible (e.g. shadow_makecontext > with shadow_freecontext) and marking the incompatible apis as such. > portable code then can decide to update to new apis, run with shstk > disabled or accept the leaks and OOM failures. the current approach > needs ifdef __CET__ in user code for makecontext and sigaltstack > has many issues. This sounds reasonable to me on the face of it. It seems mostly unrelated to the kernel ABI and purely a userspace thing. > > - i'm still not happy with the shadow stack sizing. and would like to > have a token at the end of the shadow stack to allow scanning. and > it would be nice to deal with shadow stack overflow. and there is > async disable on dlopen. so there are things to work on. I was imagining that for tracing-only users, it might make sense to run with WRSS enabled. This could mean libc's could write their own restore tokens. In the case of longjmp() it could be simple and fast. The implementation could just write a token at the target SSP and switch to it. Non C runtimes that want to use if for backtracing could also write their own preferred stack markers or other data. It also is whole different solution to what is being discussed. But over the course of this thread, I could imagine a little more now how a top of stack marker could possibly be useful for non-tracing usages. I have a patch prepared for this and I had tested to see if adding this later could disturb anything in userspace. The only thing that I found was that gdb might output a slightly different stack trace. So it would be a user visible change, if not a regression. One reason I held off on it still, is that the plan for the expanded shadow stack signal frame includes using a 0 frame, to avoid a forgery scenario. The token that makes sense for the end of stack marker is also a 0 frame. So if userspace that looks for the end of stack marker scans for the 0 frame without checking if it is part of an expanded shadow stack signal frame, then it could make more trouble for alt shadow stack. So since they are tied together, and I thought to hold off on it for now. I don't want to try to squeeze around the upstream userspace, I think a version 2 should be a clean slate on a new elf bit. > > i understand that the proposed linux abi makes most existing binaries > with shstk marking work, which is relevant for x86. > > for a while i thought we can fix the remaining issues even if that > means breaking existing shstk binaries (just bump the abi marking). > now it seems the issues can only be addressed in a future abi break. Adding a new arch_prctl() ENABLE value was the plan. Not sure what you mean by ABI break vs version bump. The plan was to add the new features without userspace regression by putting any behavior behind a different enable option. This relies on userspace to add a new elf bit, and to use it. > > which means x86 linux will likely end up maintaining two incompatible > abis and the future one will need user code and build system changes, > not just runtime changes. it is not a small incremental change to add > alt shadow stack support for example. > > i don't think the maintenance burden of two shadow stack abis is the > right path for arm64 to follow, so the shadow stack semantics will > likely become divergent not common across targets. Unfortunately we are at a bit of an information asymmetry here because the ARM spec and patches are not public. It may be part of the cause of the confusion. > > i hope my position is now clearer. It kind of sounds like you don't like the x86 glibc implementation. And you want to make sure the kernel can support whatever a new solution is that you are working on. I am on board with the goal of having some generic set of rules to make portable code work for other architectures shadow stacks. But I think how close we can get to that goal or what it looks like is an open question. For several reasons: 1. Not everyone can see all the specs 2. No POCs have been done (or at least shared) 3. It's not clear what needs to be supported (yes, I know you have made a rough proposal here, but it sounds like on the x86 glibc side at least it's not even clear what non-shadow stack stack switching operations can work together) But towards these goals, I think your technical requests are: 1. Leave a token on switching to an alt shadow stack. As discussed earlier, we can't do this because of the overflow issues. Also since, alt shadow stack cannot be transparent to existing software anyway, it should be ok to introduce limitations. So I think this one is a no. What we could do is introduce security weakening kernel helpers, but this would make sense to come with alt shadow stack support. 2. Add an end token at the top of the shadow stack. Because of the existing userspace restriction interactions, this is complicated to evaluate but I think we *could* do this now. There are pros and cons. 3. Support more options for shadow stack sizing. (I think you are referring to this conversation: https://lore.kernel.org/lkml/ZAIgrXQ4670gxlE4@xxxxxxx/). I don't see why this is needed for the base implementation. If ARM wants to add a new rlimit or clone variant, I don't see why x86 can't support it later. So if we add 2, are you satisfied? Or otherwise, on the non-technical request side, are you asking to hold off on x86 shadow stack, in order to co-develop a unified solution? I think the existing solution will see use in the meantime, including for the development of all the x86 specific JIT implementations. And finally, what I think is the most important point in all of this: I think that *how* it gets used will be a better guide for further development than us debating. For example the main pain point that has come up so far is the problems around dlopen(). And the future work that has been scoped has been for the kernel to help out in this area. This is based on _user_ (distro) requests. Any apps that don't work with shadow stack limitations can simply not enable shadow stack. You and me are debating these specific API combinations, but we can't know whether they are actually the best place to focus development efforts. And the early signs are this is NOT the most important problem to solve.