Re: [PATCH v14 22/22] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

Michael Roth <michael.roth@xxxxxxx> · Fri, 26 Apr 2024 12:57:43 -0500

On Wed, Apr 24, 2024 at 05:10:31PM -0700, Sean Christopherson wrote:
> On Sun, Apr 21, 2024, Michael Roth wrote:
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 85099198a10f..6cf186ed8f66 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -7066,6 +7066,7 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
> >  		struct kvm_user_vmgexit {
> >  		#define KVM_USER_VMGEXIT_PSC_MSR	1
> >  		#define KVM_USER_VMGEXIT_PSC		2
> > +		#define KVM_USER_VMGEXIT_EXT_GUEST_REQ	3
> 
> Assuming we can't get massage this into a vendor agnostic exit, there's gotta be
> a better name than EXT_GUEST_REQ, which is completely meaningless to me and probably
> most other people that aren't intimately familar with the specs.  Request what?

EXT_CERT_REQ maybe? That's effectively all it boils down to, fetching
the GHCB-defined certificate blob from userspace.

> 
> >  			__u32 type; /* KVM_USER_VMGEXIT_* type */
> >  			union {
> >  				struct {
> > @@ -3812,6 +3813,84 @@ static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp
> >  	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
> >  }
> >  
> > +static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
> > +{
> > +	struct vcpu_svm *svm = to_svm(vcpu);
> > +	struct vmcb_control_area *control;
> > +	struct kvm *kvm = vcpu->kvm;
> > +	sev_ret_code fw_err = 0;
> > +	int vmm_ret;
> > +
> > +	vmm_ret = vcpu->run->vmgexit.ext_guest_req.ret;
> > +	if (vmm_ret) {
> > +		if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
> > +			vcpu->arch.regs[VCPU_REGS_RBX] =
> > +				vcpu->run->vmgexit.ext_guest_req.data_npages;
> > +		goto abort_request;
> > +	}
> > +
> > +	control = &svm->vmcb->control;
> > +
> > +	/*
> > +	 * To avoid the message sequence number getting out of sync between the
> > +	 * actual value seen by firmware verses the value expected by the guest,
> > +	 * make sure attestations can't get paused on the write-side at this
> > +	 * point by holding the lock for the entire duration of the firmware
> > +	 * request so that there is no situation where SNP_GUEST_VMM_ERR_BUSY
> > +	 * would need to be returned after firmware sees the request.
> > +	 */
> > +	mutex_lock(&snp_pause_attestation_lock);
> 
> Why is this in KVM?  IIUC, KVM only needs to get involved to translate GFNs to
> PFNs, the rest can go in the sev-dev driver, no?  The whole split is weird,
> seemingly due to KVM "needing" to take this lock.  E.g. why is core kernel code
> in arch/x86/virt/svm/sev.c at all dealing with attestation goo, when pretty much
> all of the actual usage is (or can be) in sev-dev.c

It would be very tempting to have all this locking tucked away in sev-dev
driver, but KVM is a part of that sequence because it needs to handle fetching
the certificate that will be returned to the guest as part of the
attestation request. The transaction ID updates from PAUSE/RESUME
commands is technically enough satisfy this requirement, in which case
KVM wouldn't need to take the lock directly, only check that the
transaction ID isn't stale and report -EBUSY/-EAGAIN to the guest if it
is.

But if the request actually gets sent to firmware, and then we decide
after that the transaction is stale and thus the attestation response
and certificate might not be in sync, then reporting -EBUSY/-EAGAIN will
permanently hose the guest because the message seq fields will be out of
sync between firmware and guest kernel. That's why we need to hold the
lock to actually block a transaction ID update from occuring once we've
committed to sending the attestation request to firmware.

> 
> > +
> > +	if (__snp_transaction_is_stale(svm->snp_transaction_id))
> > +		vmm_ret = SNP_GUEST_VMM_ERR_BUSY;
> > +	else if (!__snp_handle_guest_req(kvm, control->exit_info_1,
> > +					 control->exit_info_2, &fw_err))
> > +		vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
> > +
> > +	mutex_unlock(&snp_pause_attestation_lock);
> > +
> > +abort_request:
> > +	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
> > +
> > +	return 1; /* resume guest */
> > +}
> > +
> > +static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
> > +{
> > +	int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
> > +	struct vcpu_svm *svm = to_svm(vcpu);
> > +	unsigned long data_npages;
> > +	sev_ret_code fw_err;
> > +	gpa_t data_gpa;
> > +
> > +	if (!sev_snp_guest(vcpu->kvm))
> > +		goto abort_request;
> > +
> > +	data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> > +	data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
> > +
> > +	if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
> > +		goto abort_request;
> > +
> > +	svm->snp_transaction_id = snp_transaction_get_id();
> > +	if (snp_transaction_is_stale(svm->snp_transaction_id)) {
> 
> Why bother?  I assume this is rare, so why not handle it on the backend, i.e.
> after userspace does its thing?  Then KVM doesn't even have to care about
> checking for stale IDs, I think?

That's true, it's little more than a very minor performance optimization
to do it here rather than on the return path, so could definitely be
dropped.

> 
> None of this makes much sense to my eyes, e.g. AFAICT, the only thing that can
> pause attestation is userspace, yet the kernel is responsible for tracking whether
> or not a transaction is fresh?

Pausing is essentially the start of an update transaction initiated by
userspace. So the kernel is tracking whether there's a potential
transaction that has been started or completed since it first started
servicing the attestation request, otherwise there could be a mismatch
between the endorsement key used for the attestation report versus the
certificate for the endorsement key that was fetched from userspace.

-Mike