On Fri, Feb 12, 2021 at 01:42:05PM -0800, Andi Kleen wrote: > > > I don't know the details about TDX and #VE, but could a malicious HV not > > > trigger a #VE basically everywhere by mapping around pages? So 'fail' > > > means panic() in this case, right? > > > > Right. > > Well we might not be able to reliably panic if we don't run on a IST > if it hits the syscall gap. Otherwise you might end up with panic > running on the ring 3 stack. > > Given it's a bit muddled threat model - would need both a > malicious process in the hypervisor and inside the secure guest, > but I presume that's possible. > > That seems to argue that an IST for #VE is actually required. But AFAI recursive #VE is entirely possible. The moment #VE reads that ve_info thing, NMIs can happen, which can trigger another #VE which then clobbers your stack and we're irrecoverably screwed again. (also, inhibiting NMI is a seriously dodgy hack, the very last thing x86 needs is is more ductape on the recursion rules) Repeat after me: ISTs aren't a solution but part of the problem. If TDX requires IST, it's architecturally bankrupt. There is much talk elsewhere in this thread about validated pages; have the TDX module hard guarantee certain pages are available and will not *ever* generate #VE. TDX module can kill the guest, but must not #VE. Without something like that it's a complete and utter non-starter.