Re: [PATCH 2/3] x86/sev-es: Check if regs->sp is trusted before adjusting #VC IST stack

Joerg Roedel <joro@xxxxxxxxxx> · Fri, 19 Feb 2021 12:05:49 +0100

On Thu, Feb 18, 2021 at 04:28:36PM -0800, Andy Lutomirski wrote:
> On Thu, Feb 18, 2021 at 11:21 AM Joerg Roedel <jroedel@xxxxxxx> wrote:
> Can you give me an example, even artificial, in which the linked-list
> logic is useful?

So here we go, its of course artificial, but still:

	1. #VC happens, not important where
	2. NMI in the #VC prologue before it moved off its IST stack
	   - first VC IST adjustment happening here
	3. #VC in the NMI handler
	4. #HV in the #VC prologue again
	   - second VC IST adjustment happening here, so the #HV handler
	     can cause its own #VC exceptions.

Can only happen if the #HV handler is allowed to cause #VC exceptions.
But even if its not allowed, it can happen with SNP and a malicious
Hypervisor. But in this case the only option is to reliably panic.

> Can you explain your reasoning in considering the entry stack unsafe?
> It's 4k bytes these days.

I wasn't aware that it is 4k in size now. I still thought it was just
these 64 words large and one can not simply execute C code on it.

> You forgot about entry_SYSCALL_compat.

Right, thanks for pointing this out.

> Your 8-byte alignment is confusing to me.  In valid kernel code, SP
> should be 8-byte-aligned already, and, if you're trying to match
> architectural behavior, the CPU aligns to 16 bytes.

Yeah, I was just being cautious. The explicit alignment can be removed,
Boris also pointed this out.

> We're not robust against #VC, NMI in the #VC prologue before the magic
> stack switch, and a new #VC in the NMI prologue.  Nor do we appear to
> have any detection of the case where #VC nests directly inside its own
> prologue.  Or did I miss something else here?

No, you don't miss anything here. At the moment #VC can't happen at
those places, so this is not handled yet. With SNP it can happen and
needs to be handled in a way to at least allow a reliable panic (because
if it really happens the Hypervisor is messing with us).

> If we get NMI and get #VC in the NMI *asm*, the #VC magic stack switch
> looks like it will merrily run itself in the NMI special-stack-layout
> section, and that sounds really quite bad.

Yes, I havn't looked at the details yet, but if a #VC happens there it
probably better not returns.

> I mean that, IIRC, a malicious hypervisor can inject inappropriate
> vectors at inappropriate times if the #HV mechanism isn't enabled.
> For example, it could inject a page fault or an interrupt in a context
> in which we have the wrong GSBASE loaded.

Yes, a malicious Hypervisor can do that, and without #HV there is no
real protection against this besides turning all vectors (even IRQs)
into paranoid entries. Maybe even more care is needed, but I think its
not worth to care about this. 

> But the #DB issue makes this moot.  We have to use IST unless we turn
> off SCE.  But I admit I'm leaning toward turning off SCE until we have
> a solution that seems convincingly robust.

Turning off SCE might be tempting, but I guess doing so would break a
quite some user-space code, no?

Regards,

	Joerg