Re: RFC: userspace exception fixups

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Thu, 8 Nov 2018 12:05:42 -0800

On Thu, Nov 8, 2018 at 11:54 AM Sean Christopherson
<sean.j.christopherson@xxxxxxxxx> wrote:
>
> On Tue, Nov 06, 2018 at 01:07:54PM -0800, Andy Lutomirski wrote:
> >
> >
> > > On Nov 6, 2018, at 1:00 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> > >
> > >> On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> > >> True, but what if we have a nasty enclave that writes to memory just
> > >> below SP *before* decrementing SP?
> > >
> > > Yeah, that would be unfortunate.  If an enclave did this (roughly):
> > >
> > >    1. EENTER
> > >    2. Hardware sets eenter_hwframe->sp = %sp
> > >    3. Enclave runs... wants to do out-call
> > >    4. Enclave sets up parameters:
> > >        memcpy(&eenter_hwframe->sp[-offset], arg1, size);
> > >        ...
> > >    5. Enclave sets eenter_hwframe->sp -= offset
> > >
> > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
> > > was on the stack.  The enclave could easily fix this by moving ->sp first.
> > >
> > > But, this is one of those "fun" parts of the ABI that I think we need to
> > > talk about.  If we do this, we also basically require that the code
> > > which handles asynchronous exits must *not* write to the stack.  That's
> > > not hard because it's typically just a single ERESUME instruction, but
> > > it *is* a requirement.
> > >
> >
> > I was assuming that the async exit stuff was completely hidden by the
> > API.  The AEP code would decide whether the exit got fixed up by the
> > kernel (which may or may not be easy to tell — can the code even tell
> > without kernel help whether it was, say, an IRQ vs #UD?) and then either
> > do ERESUME or cause sgx_enter_enclave() to return with an appropriate
> > return value.
>
> Ok, SDK folks came up with an idea that would allow them to use vDSO,
> albeit with a bit of ugliness and potentially a ROP-attack issue.
> Definitely some weirdness, but the weirdness is well contained, unlike
> the magic prefix approach.
>
> Provide two enter_enclave() vDSO "functions".  The first is a normal
> function with a normal C interface.  The second is a blob of code that
> is "called" and "returns" via indirect jmp, and can be used by SGX
> runtimes that want to use the untrusted stack for out-calls from the
> enclave.
>
> For the indirect jmp "function", use %rbp to stash the return address
> of the caller (either in %rbp itself or in memory pointed to by %rbp).
> It works because hardware also saves/restores %rbp along with %rsp when
> doing enclave transitions, and the SDK can live with %rbp being
> off-limits.  Fault info is passed via registers.

Hmm.  The idea being that the SDK preserves RBP but not RSP.  That's
not the most terrible thing in the world.  But could the SDK live with
something more like my suggestion where the vDSO supplies a normal
function that takes a struct containing registers that are visible to
the enclave?  This would make it extremely awkward for the enclave to
use the untrusted stack per se, but it would make it quite easy (I
think) for the untrusted part of the SDK to allocate some extra memory
and just tell the enclave that *that* memory is the stack.

AFAFICS we do have two registers that genuinely are preserved: FSBASE
and GSBASE.  Which is a good thing, because otherwise SGX enablement
would currently be a privilege escalation issue due to making GSBASE
writable when it should not be.

This whole thing is a mess.  I'm starting to think that the cleanest
solution would be to provide a way to just tell the kernel that
certain RIP values have exception fixups.