Re: [RFC PATCH 4/4] x86/vdso: x86/sgx: Allow the user to exit the vDSO loop on interrupts

Andy Lutomirski <luto@xxxxxxxxxx> · Thu, 20 Aug 2020 10:44:36 -0700

On Thu, Aug 20, 2020 at 4:20 AM Jethro Beekman <jethro@xxxxxxxxxxxx> wrote:
>
> On 2020-08-19 17:02, Andy Lutomirski wrote:
> > On Wed, Aug 19, 2020 at 7:21 AM Jethro Beekman <jethro@xxxxxxxxxxxx> wrote:
> >>
> >> On 2020-08-18 19:15, Andy Lutomirski wrote:
> >>> On Mon, Aug 17, 2020 at 9:24 PM Sean Christopherson
> >>> <sean.j.christopherson@xxxxxxxxx> wrote:
> >>>>
> >>>> Allow userspace to exit the vDSO on interrupts that are acknowledged
> >>>> while the enclave is active.  This allows the user's runtime to switch
> >>>> contexts at opportune times without additional overhead, e.g. when using
> >>>> an M:N threading model (where M user threads run N TCSs, with N > M).
> >>>
> >>> This is IMO rather odd.  We don't support this type of notification on
> >>> interrupts for normal user code.  The fact user code can detect
> >>> interrupts during enclave execution is IMO an oddity of SGX, and I
> >>> have asked Intel to consider replacing the AEX mechanism with
> >>> something more transparent to user mode.  If this ever happens, this
> >>> mechanism is toast.
> >>
> >> Let's design the current interface for the current architecture. We can deal with a new architecture if and when Intel provides it.
> >
> > No.  If Intel fixes the architecture, the proposed interface will
> > *stop working*.
> >
> >>
> >>> Even without architecture changes, building a *reliable* M:N threading
> >>> mechanism on top of this will be difficult or impossible, as there is
> >>> no particular guarantee that a thread will get timing interrupts at> all or that these interrupts will get lucky and hit enclave code, thus
> >>> triggering an AEX.  We certainly don't, and probably never will,
> >>> support any corresponding feature for non-enclave code.
> >>
> >> There's no guarantee, but this vDSO exit mechanism is a prerequisite. Both for context switching and aborting an enclave, userspace *must* have a way to trigger exit from enclave mode *and* recover the user stack in a sane manner. Userspace *should* also be able to do this in a way that's compatible with library use, so calling timer_create or pthread_kill to deliver a signal would be ok, but installing a signal handler should be avoided (the whole reason behind this vDSO call).
> >
> > If you want to abort an enclave, abort it the same way you abort any
> > other user code -- send a signal or something.  If something is wrong> with signal handling in the proposed vDSO interface, then by all
> > means, let's fix it.  If we need better library signal support, let's
> > add such a thing.  If you really want to abort an enclave *cleanly*
> > without any chance of garbage left around, stick it in a subprocess.
> > We are not playing the game where someone sets a flag, crosses their
> > fingers, and waits for an interrupt.
>
> Sending a signal is not sufficient. The current __vdso_sgx_enter_enclave call is not interruptible.
>

Why not?  If we are failing to deliver signals if
__vdso_sgx_enter_enclave() is running, we have a bug and we should fix
it.  We should actually deliver the signal and then either resume the
enclave or return -EINTR or equivalent.  Are we getting this wrong?
Looking at your code, I think we're doing it right but maybe we could
do better.

I suppose we also need a test case for this if we do anything fancy here.

> > Now maybe I'm wrong and there's an actual legitimate use case for this
> > trick, in which case I'd like to be enlightened.  But M:N threading
> > and enclave aborting don't sound like legitimate use cases.
> >
>
> Why is it ok for this code to return to userspace after 1 second:
>
>
>
> void ignore_signal_but_dont_restart(int s) {}
>
> // set SIGALRM handler if none set (FIXME: racy)
> struct sigaction sa_old;
> struct sigaction sa = {
>     .sa_handler = ignore_signal_but_dont_restart,
>     .sa_flags = 0,
> };
> sigaction(SIGALRM, &sa, &sa_old);
> if (!(sa_old.sa_handler == SIG_IGN || sa_old.sa_handler == SIG_DFL)
>     || (sa_old.sa_flags & SA_SIGINFO)) {
>     sa_old.sa_flags &= ~SA_RESTART;
>     sigaction(SIGALRM, &sa_old, NULL);
> }
>
> alarm(1);
>
> char buf;
> read(0, &buf, 1);
>

Seems fine.  That's POSIX.  User code is receiving a signal and is
being affected by the signal.  A signal is a user-visible thing.

>
>
> But, according to your train of thought, this code must hang indefinitely (assuming the enclave is not calling EEXIT)?
>
>
>
> void ignore_signal_but_dont_restart(int s) {}
>
> // set SIGALRM handler if none set (FIXME: racy)
> struct sigaction sa_old;
> struct sigaction sa = {
>     .sa_handler = ignore_signal_but_dont_restart,
>     .sa_flags = 0,
> };
> sigaction(SIGALRM, &sa, &sa_old);
> if (!(sa_old.sa_handler == SIG_IGN || sa_old.sa_handler == SIG_DFL)
>     || (sa_old.sa_flags & SA_SIGINFO)) {
>     sa_old.sa_flags &= ~SA_RESTART;
>     sigaction(SIGALRM, &sa_old, NULL);
> }
>
> alarm(1);
>
> __vdso_sgx_enter_enclave(0, 0, 0, 2, 0, 0, tcs, NULL, NULL);
>

This is a genuine semantic question: is __vdso_sgx_enter_enclave()
like read on a pipe (returns -EINTR on a signal) or is it more like a
restartable syscall or a normal library function that just keeps
running if your signal handler does nothing?  You could siglongjmp()
out, but that's a bit gross.

I wouldn't object to an option to __vdso_sgx_enter_enclave() to make
it return -EINTR if signaled by a non-SA_RESTART signal.  Implementing
it might be distinctly nontrivial, though.

But this isn't what this patch does, and I suspect we've been talking
past each other.  This patch makes __vdso_sgx_enter_enclave() return
if there's an *interrupt*.  If you push a keyboard button, move your
mouse, get a network interrupt on that core, etc, it will return.
This is nonsense.

--Andy