On Fri, Dec 7, 2018 at 1:26 PM Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > On Fri, Dec 07, 2018 at 12:16:59PM -0800, Andy Lutomirski wrote: > > > > > > > On Dec 7, 2018, at 12:09 PM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > > > > >> On Fri, Dec 07, 2018 at 11:23:10AM -0800, Andy Lutomirski wrote: > > >> > > >> Ah, I see. You’re saying that, if the non-enclave stare is corrupted such > > >> that RIP is okay and RSP still points somewhere reasonable but the return > > >> address is garbage, then we can at least get to the fault handler and print > > >> something? > > > > > > Yep. Even for something more subtle like GPR corruption it could dump the > > > entire call stack before attempting to return back up. > > > > > >> This only works if the fault handler pointer itself is okay, though, which > > >> somewhat limits the usefulness, given that its pointer is quite likely to > > >> be on the stack very close to the return address. > > > > > > Yeah, it's not a silver bullet by any means, but it does seem useful for at > > > least some scenarios. Even exploding when invoking the handler instead of > > > at a random point might prove useful, e.g. "calling my exit handler exploded, > > > maybe my enclave corrupted the stack!". > > > > Here’s another idea: calculate some little hash or other checksum of > > RSP, RBP, and perhaps a couple words on the stack, and do: > > Corrupting RSP and RBP as opposed to the stack memory seems much less > likely since the enclave would have to poke into the save state area. > And as much as I dislike the practice of intentionally manipulating > SSA.RSP, preventing the user from doing something because we're "helping" > doesn't seem right. > > > call __vdso_enclave_corrupted_state > > > > If you get a mismatch after return. That function could be: > > > > call __vdso_enclave_corrupted_state: > > ud2 > > > > And now the debug trace makes it very clear what happened. > > > > This may or may not be worth the effort. > > Running a checksum on the stack for every exit doesn't seem like it'd > be worth the effort, especially since this type of bug should be quite > rare, at least in production environments. > > If we want to pursue the checksum idea I think the easiest approach > would be to combine it with an exit_handler and do a simple check on > the handler. It'd be minimal overhead in the fast path and would flag > cases where invoking exit_handle() would explode, while deferring all > other checks to the user. How about this variant? #define MAGIC 0xaaaabbbbccccddddul #define RETADDR_HASH ((unsigned long)__builtin_return_address(0) ^ MAGIC) void foo(void) { volatile unsigned long hash = RETADDR_HASH; /* placeholder for your actual code */ asm volatile ("nop"); if (hash != RETADDR_HASH) asm volatile ("ud2"); } But I have a real argument for dropping exit_handler: in this new age of Spectre, the indirect call is a retpoline, and it's therefore quite slow. So I'm not saying NAK, but I do think it's unnecessary. I don't suppose you've spent a bunch of time programming in the continuation-passing style? :)