Re: [PATCH 04/10] x86/cet: Handle thread shadow stack

Andy Lutomirski <luto@xxxxxxxxxx> · Thu, 7 Jun 2018 13:53:41 -0700

On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@xxxxxxxxxx> wrote:
>
> On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx> wrote:
> >>
> >> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
> >> needs a separate program stack and a separate shadow stack.
> >> This patch handles allocation and freeing of the thread shadow
> >> stack.
> >
> > Aha -- you're trying to make this automatic.  I'm not convinced this
> > is a good idea.  The Linux kernel has a long and storied history of
> > enabling new hardware features in ways that are almost entirely
> > useless for userspace.
> >
> > Florian, do you have any thoughts on how the user/kernel interaction
> > for the shadow stack should work?
>
> I have not looked at this in detail, have not played with the emulator,
> and have not been privy to any discussions before these patches have
> been posted, however …
>
> I believe that we want as little code in userspace for shadow stack
> management as possible.  One concern I have is that even with the code
> we arguably need for various kinds of stack unwinding, we might have
> unwittingly built a generic trampoline that leads to full CET bypass.

I was imagining an API like "allocate a shadow stack for the current
thread, fail if the current thread already has one, and turn on the
shadow stack".  glibc would call clone and then call this ABI pretty
much immediately (i.e. before making any calls from which it expects
to return).

We definitely want strong enough user control that tools like CRIU can
continue to work.  I haven't looked at the SDM recently enough to
remember for sure, but I'm reasonably confident that user code can
learn the address of its own shadow stack.  If nothing else, CRIU
needs to be able to restore from a context where there's a signal on
the stack and the signal frame contains a shadow stack pointer.

>
> I also expect that we'd only have donor mappings in userspace anyway,
> and that the memory is not actually accessible from userspace if it is
> used for a shadow stack.
>
> > My intuition would be that all
> > shadow stack management should be entirely controlled by userspace --
> > newly cloned threads (with CLONE_VM) should have no shadow stack
> > initially, and newly started processes should have no shadow stack
> > until they ask for one.
>
> If the new thread doesn't have a shadow stack, we need to disable
> signals around clone, and we are very likely forced to rewrite the early
> thread setup in assembler, to avoid spurious calls (including calls to
> thunks to get EIP on i386).  I wouldn't want to do this If we can avoid
> it.  Just using C and hoping to get away with it doesn't sound greater,
> either.  And obviously there is the matter that the initial thread setup
> code ends up being that universal trampoline.
>

Only if the trampoline works if the shadow stack is already enabled.

I could very easily be convinced that automatic shadow stack setup is
a good idea, but I still think we need manual control for CRIU and
such.