Re: [PATCH 03/10] KVM: SVM: Terminate the VM if a SEV-ES+ guest is run with an invalid VMSA

Tom Lendacky <thomas.lendacky@xxxxxxx> · Mon, 24 Feb 2025 17:55:57 -0600

On 2/24/25 16:55, Sean Christopherson wrote:
> On Mon, Feb 24, 2025, Tom Lendacky wrote:
>> On 2/18/25 19:26, Sean Christopherson wrote:
>>> -void pre_sev_run(struct vcpu_svm *svm, int cpu)
>>> +int pre_sev_run(struct vcpu_svm *svm, int cpu)
>>>  {
>>>  	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
>>> -	unsigned int asid = sev_get_asid(svm->vcpu.kvm);
>>> +	struct kvm *kvm = svm->vcpu.kvm;
>>> +	unsigned int asid = sev_get_asid(kvm);
>>> +
>>> +	/*
>>> +	 * Terminate the VM if userspace attempts to run the vCPU with an
>>> +	 * invalid VMSA, e.g. if userspace forces the vCPU to be RUNNABLE after
>>> +	 * an SNP AP Destroy event.
>>> +	 */
>>> +	if (sev_es_guest(kvm) && !VALID_PAGE(svm->vmcb->control.vmsa_pa)) {
>>> +		kvm_vm_dead(kvm);
>>> +		return -EIO;
>>> +	}
>>
>> If a VMRUN is performed with the vmsa_pa value set to INVALID_PAGE, the
>> VMRUN will fail and KVM will dump the VMCB and exit back to userspace
> 
> I haven't tested, but based on what the APM says, I'm pretty sure this would crash
> the host due to a #GP on VMRUN, i.e. due to the resulting kvm_spurious_fault().
> 
>   IF (rAX contains an unsupported physical address)
>     EXCEPTION [#GP]

Well that's for the VMCB, the VMSA is pointed to by the VMCB and results
in a VMEXIT code of -1 if you don't supply a proper page-aligned,
physical address.

> 
>> with KVM_EXIT_INTERNAL_ERROR.
>>
>> Is doing this preferrable to that?
> 
> Even if AMD guaranteed that the absolute worst case scenario is a failed VMRUN
> with zero side effects, doing VMRUN with a bad address should be treated as a
> KVM bug.

Fair.

> 
>> If so, should a vcpu_unimpl() message be issued, too, to better identify the
>> reason for marking the VM dead?
> 
> My vote is no.  At some point we need to assume userspace possesess a reasonable
> level of competency and sanity.
> 
>>>  static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>>> @@ -4231,7 +4233,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
>>>  	if (force_immediate_exit)
>>>  		smp_send_reschedule(vcpu->cpu);
>>>  
>>> -	pre_svm_run(vcpu);
>>> +	if (pre_svm_run(vcpu))
>>> +		return EXIT_FASTPATH_EXIT_USERSPACE;

In testing this out, I think userspace continues on because I eventually
get:

KVM_GET_PIT2 failed: Input/output error
/tmp/cmdline.98112: line 1: 98163 Aborted (core dumped) ...

Haven't looked too close, but maybe an exit_reason needs to be set to
get qemu to quit sooner?

Thanks,
Tom

>>
>> Since the return code from pre_svm_run() is never used, should it just
>> be a bool function, then?
> 
> Hard no.  I strongly dislike boolean returns for functions that aren't obviously
> predicates, because it's impossible to determine the polarity of the return value
> based solely on the prototype.  This leads to bugs that are easier to detect with
> 0/-errno return, e.g. returning -EINVAL in a happy path stands out more than
> returning the wrong false/true value.
> 
> Case in point (because I just responded to another emain about this function),
> what's the polarity of this helper?  :-)
> 
>   static bool sanity_check_entries(struct kvm_cpuid_entry2 __user *entries,
> 				   __u32 num_entries, unsigned int ioctl_type)