Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 15 Nov 2021 17:25:43 +0000

On Mon, Nov 15, 2021, Joerg Roedel wrote:
> On Sat, Nov 13, 2021 at 06:34:52PM +0000, Sean Christopherson wrote:
> > I'm not treating it nonchalantly, merely acknowledging that (a) some flavors of kernel
> > bugs (or hardware issues!) are inherently fatal to the system, and (b) crashing the
> > host may be preferable to continuing on in certain cases, e.g. if continuing on has a
> > high probablity of corrupting guest data.
> 
> The problem here is that for SNP host-side RMP faults it will often not
> be clear at fault-time if it was caused by wrong guest or host behavior. 
> 
> I agree with Marc that crashing the host is not the right thing to do in
> this situation. Instead debug data should be collected to do further
> post-mortem analysis.

Again, I am not saying that any RMP #PF violation is an immediate, "crash the
host".  It should be handled exactly like any other #PF due to permission violation.
The only wrinkle added by the RMP is that the #PF can be due to permissions on the
GPA itself, but even that is not unique, e.g. see the proposed KVM XO support that
will hopefully still land someday.

If the guest violates the current permissions, it (indirectly) gets a #VC.  If host
userspace violates permissions, it gets SIGSEGV.  If the host kernel violates
permissions, then it reacts to the #PF in whatever way it can.  What I am saying is
that in some cases, there is _zero_ chance of recovery in the host and so crashing
the entire system is inevitable.   E.g. if the host kernel hits an RMP #PF when
vectoring a #GP because the IDT lookup somehow triggers an RMP violation, then the
host is going into triple fault shutdown.

[*] https://lore.kernel.org/linux-mm/20191003212400.31130-1-rick.p.edgecombe@xxxxxxxxx/