Re: AMD SEV-SNP/Intel TDX: validation of memory pages

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Fri, 12 Feb 2021 22:58:52 +0100

On Fri, Feb 12, 2021 at 01:42:05PM -0800, Andi Kleen wrote:
> > > I don't know the details about TDX and #VE, but could a malicious HV not
> > > trigger a #VE basically everywhere by mapping around pages? So 'fail'
> > > means panic() in this case, right?
> > 
> > Right.
> 
> Well we might not be able to reliably panic if we don't run on a IST
> if it hits the syscall gap. Otherwise you might end up with panic
> running on the ring 3 stack.
> 
> Given it's a bit muddled threat model - would need both a
> malicious process in the hypervisor and inside the secure guest,
> but I presume that's possible.
> 
> That seems to argue that an IST for #VE is actually required.

But AFAI recursive #VE is entirely possible. The moment #VE reads that
ve_info thing, NMIs can happen, which can trigger another #VE which then
clobbers your stack and we're irrecoverably screwed again.

(also, inhibiting NMI is a seriously dodgy hack, the very last thing x86
needs is is more ductape on the recursion rules)

Repeat after me: ISTs aren't a solution but part of the problem.

If TDX requires IST, it's architecturally bankrupt.

There is much talk elsewhere in this thread about validated pages; have
the TDX module hard guarantee certain pages are available and will not
*ever* generate #VE. TDX module can kill the guest, but must not #VE.

Without something like that it's a complete and utter non-starter.