Re: [PATCH] KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure

Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> · Wed, 21 Jul 2021 14:32:33 +0200

Sean Christopherson <seanjc@xxxxxxxxxx> writes:

> On Thu, May 20, 2021, Tom Lendacky wrote:
>> On 5/20/21 2:16 PM, Sean Christopherson wrote:
>> > On Mon, May 17, 2021, Tom Lendacky wrote:
>> >> On 5/14/21 6:06 PM, Peter Gonda wrote:
>> >>> On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
>> >>>>
>> >>>> Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT
>> >>>> exit code and parameters fail. Since the VMGEXIT instruction can be issued
>> >>>> from userspace, even though userspace (likely) can't update the GHCB,
>> >>>> don't allow userspace to be able to kill the guest.
>> >>>>
>> >>>> Return a #GP request through the GHCB when validation fails, rather than
>> >>>> terminating the guest.
>> >>>
>> >>> Is this a gap in the spec? I don't see anything that details what
>> >>> should happen if the correct fields for NAE are not set in the first
>> >>> couple paragraphs of section 4 'GHCB Protocol'.
>> >>
>> >> No, I don't think the spec needs to spell out everything like this. The
>> >> hypervisor is free to determine its course of action in this case.
>> > 
>> > The hypervisor can decide whether to inject/return an error or kill the guest,
>> > but what errors can be returned and how they're returned absolutely needs to be
>> > ABI between guest and host, and to make the ABI vendor agnostic the GHCB spec
>> > is the logical place to define said ABI.
>> 
>> For now, that is all we have for versions 1 and 2 of the spec. We can
>> certainly extend it in future versions if that is desired.
>> 
>> I would suggest starting a thread on what we would like to see in the next
>> version of the GHCB spec on the amd-sev-snp mailing list:
>> 
>> 	amd-sev-snp@xxxxxxxxxxxxxx
>
> Will do, but in the meantime, I don't think we should merge a fix of any kind
> until there is consensus on what the VMM behavior will be.  IMO, fixing this in
> upstream is not urgent; I highly doubt anyone is deploying SEV-ES in production
> using a bleeding edge KVM.

Sorry for resurrecting this old thread but were there any deveopments
here? I may have missed something but last time I've checked a single
"rep; vmmcall" from userspace was still crashing the guest. The issue,
however, doesn't seem to reproduce with Vmware ESXi which probably means
they're just skipping the instruction and not even injecting #GP (AFAIR,
I don't have an environment to re-test handy).

-- 
Vitaly