Re: [PATCH v1 4/5] KVM: Introduce KVM_EXIT_COCO exit type

Michael Roth <michael.roth@xxxxxxx> · Wed, 26 Jun 2024 12:30:58 -0500

On Wed, Jun 26, 2024 at 07:22:43AM -0700, Sean Christopherson wrote:
> On Fri, Jun 21, 2024, Michael Roth wrote:
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index ecfa25b505e7..2eea9828d9aa 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -7122,6 +7122,97 @@ Please note that the kernel is allowed to use the kvm_run structure as the
> >  primary storage for certain register types. Therefore, the kernel may use the
> >  values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
> >  
> > +::
> > +
> > +		/* KVM_EXIT_COCO */
> > +		struct kvm_exit_coco {
> > +		#define KVM_EXIT_COCO_REQ_CERTS			0
> > +		#define KVM_EXIT_COCO_MAX			1
> > +			__u8 nr;
> > +			__u8 pad0[7];
> > +			union {
> > +				struct {
> > +					__u64 gfn;
> > +					__u32 npages;
> > +		#define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN		1
> > +		#define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC		(1 << 31)
> 
> Unless I'm mistaken, these error codes are defined by the GHCB, which means the
> values matter, i.e. aren't arbitrary KVM-defined values.

They do happen to coincide with the GHCB-defined values:

  /*
   * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define
   * a GENERIC error code such that it won't ever conflict with GHCB-defined
   * errors if any get added in the future.
   */
  #define SNP_GUEST_VMM_ERR_INVALID_LEN   1
  #define SNP_GUEST_VMM_ERR_BUSY          2
  #define SNP_GUEST_VMM_ERR_GENERIC       BIT(31)

and not totally by accident. But the KVM_EXIT_COCO_REQ_CERTS_ERR_* are
defined/documented without any reliance on the GHCB spec and are purely
KVM-defined. I just didn't really see any reason to pick different
numerical values since it seems like purposely obfuscating things for
no real reason. But the code itself doesn't rely on them being the same
as the spec defines, so we are free to define these however we'd like as
far as the KVM API goes.

> 
> I forget exactly what we discussed in PUCK, but for the error codes, I think KVM
> should either define it's own values that are completely disconnected from any
> "harware" spec, or KVM should very explicitly #define all hardware values and have

I'd gotten the impression that option 1) is what we were sort of leaning
toward, and that's the approach taken here.

> the semantics of "ret" be vendor specific.  A hybrid approach doesn't really work,
> e.g. KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC isn't used anywhere and and looks quite odd.

This is a catch-all error for userspace to set if any issues are
encountered that don't map to any other
KVM_EXIT_COCO_REQ_CERTS_ERR_* cases (like INVALID_LEN). It's defined
purely for KVM/userspace and not based on any other spec.

> 
> My vote is for vendor specific error codes, because unlike having a common user
> exit reason+struct, I don't think arch-neutral error codes will minimize KVM's ABI,
> I think it'll do the exact opposite.  The only thing we need to require is that
> '0' == success.

I think this makes sense if we think of using KVM_EXIT_COCO mainly an
interface for GHCB/GHCI interactions, but now that we're leveraging
KVM_HC_MAP_GPA_RANGE for page-state change requests, and TDX is planning
to do the same, it doesn't really seem like likely that exposing those
definitions to userspace at that level will reduce ABI.

For instance this is purely just a KVM interface to request a certificate
blob from userspace, which is a side-note as far as all the GHCB-defined
definitions KVM needs to deal with regarding handling GHCB
extended/non-extended guest requests. And KVM itself might have it's own
requirements on top for what it needs from userspace, and those
requirements might be separate from these vendor specs.

And if we expose things selectively to keep the ABI small, it's a bit
awkward too. For instance, KVM_EXIT_COCO_REQ_CERTS_ERR_* basically needs
a way to indicate success/fail/ENOMEM. Which we have with
(assuming 0==success):

  #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN         1
  #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC             (1 << 31)

But the GHCB also defines other values like:

  #define SNP_GUEST_VMM_ERR_BUSY          2  

which don't make much sense to handle on the userspace side and doesn't
really have anything to do with the KVM_EXIT_COCO_REQ_CERTS KVM event,
which is a separate/self-contained thing from the general guest request
protocol. So would we expose that as ABI or not? If not then we end up
with this weird splitting of code. And if yes, then we have to sort of
give userspace a way to discover whenever new error codes are added to
the GHCB spec, because KVM needs to understand these value too and
users might be running on older kernel where only the currently-defined
error codes are present understood.

E.g. if we started off implementing KVM_EXIT_COCO_REQ_CERTS without a
way to request a larger buffer from the guest, and it wasn't later
on that SNP_GUEST_VMM_ERR_INVALID_LEN was added, we'd probably need a
capability bit or something to see if KVM supports requesting larger
page sizes from the guest. Otherwise userspace might just set it because
the spec says it's valid, but it won't work as expected because KVM
hasn't implemented that.

I guess technically we could reason about this particular one based on
which GHCB protocol version was set via KVM_SEV_INIT2, but what if
KVM itself was adding that functionality separately from the spec, and
now we got this intermingling of specs.

> 
> E.g. I think we can end up with something like:
> 
>   static int snp_complete_req_certs(struct kvm_vcpu *vcpu)
>   {
> 	struct vcpu_svm *svm = to_svm(vcpu);
> 	struct vmcb_control_area *control = &svm->vmcb->control;
> 
> 	if (vcpu->run->coco.req_certs.ret)
> 		if (vcpu->run->coco.req_certs.ret == SNP_GUEST_VMM_ERR_INVALID_LEN)

I'm not opposed to this approach, but just deciding which of:

  #define SNP_GUEST_VMM_ERR_INVALID_LEN   1
  #define SNP_GUEST_VMM_ERR_BUSY          2
  #define SNP_GUEST_VMM_ERR_GENERIC       BIT(31)

should be exposed to userspace based on how we've defined the
KVM_EXIT_COCO_REQ_CERTS already seems like an unecessary dilemma
versus just defining exactly what's needed and documenting that
in the KVM API.

If we anticipate needing to expose big chunks of GHCB/GHCI to
userspace for other reasons or future extensions of KVM_EXIT_COCO_*
then I definitely see the rationale to avoid duplication. But with
KVM_HC_MAP_GPA_RANGE case covered, I don't see any major reason to
think this will ever end up being the case.

It seems more likely this will just be KVM's handy place to handle "Hey
userspace, I need you to handle some CoCo-related stuff for me" and
it's really KVM that's driving those requirements vs. any particular
spec.

For instance, the certificate-fetching in the first place is only
handled by userspace because that's how KVM communinity decided to
handle it, not some general spec-driven requirement to handle these
sorts of things in userspace. Similarly for the KVM_HC_MAP_GPA_RANGE
that we originally considered this interface to handle: the fact that
userspace handles those requests is mainly a KVM/gmem design decision.

And like the KVM_HC_MAP_GPA_RANGE case, maybe we find there are cases
where a common KVM-defined event type can handle the requirements of
multiple specs with a common interface API, without exposing any
particular vendor definitions.

So based on that I sort of think giving KVM more flexibility on how it
wants to implement/document specific KVM_EXIT_COCO event types will
ultimately result in cleaner and more manageable ABI.

-Mike

> 			vcpu->arch.regs[VCPU_REGS_RBX] = vcpu->run->coco.req_certs.npages;
> 
> 		ghcb_set_sw_exit_info_2(svm->sev_es.ghcb,
> 					SNP_GUEST_ERR(vcpu->run->coco.req_certs.ret, 0));
> 		return 1;
> 	}
> 
> 	return snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
>   }
> 
> > +					__u32 ret;
> > +				} req_certs;
> > +			};
>