Re: [PATCH v14 09/22] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 25 Apr 2024 15:13:40 -0700

On Thu, Apr 25, 2024, Michael Roth wrote:
> On Wed, Apr 24, 2024 at 01:59:48PM -0700, Sean Christopherson wrote:
> > On Sun, Apr 21, 2024, Michael Roth wrote:
> > > +static int snp_begin_psc_msr(struct kvm_vcpu *vcpu, u64 ghcb_msr)
> > > +{
> > > +	u64 gpa = gfn_to_gpa(GHCB_MSR_PSC_REQ_TO_GFN(ghcb_msr));
> > > +	u8 op = GHCB_MSR_PSC_REQ_TO_OP(ghcb_msr);
> > > +	struct vcpu_svm *svm = to_svm(vcpu);
> > > +
> > > +	if (op != SNP_PAGE_STATE_PRIVATE && op != SNP_PAGE_STATE_SHARED) {
> > > +		set_ghcb_msr(svm, GHCB_MSR_PSC_RESP_ERROR);
> > > +		return 1; /* resume guest */
> > > +	}
> > > +
> > > +	vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
> > > +	vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_PSC_MSR;
> > > +	vcpu->run->vmgexit.psc_msr.gpa = gpa;
> > > +	vcpu->run->vmgexit.psc_msr.op = op;
> > 
> > Argh, no.
> > 
> > This is the same crud that TDX tried to push[*].  Use KVM's existing user exits,
> > and extend as *needed*.  There is no good reason page state change requests need
> > *two* exit reasons.  The *only* thing KVM supports right now is private<=>shared
> > conversions, and that can be handled with either KVM_HC_MAP_GPA_RANGE or
> > KVM_EXIT_MEMORY_FAULT.
> > 
> > The non-MSR flavor can batch requests, but I'm willing to bet that the overwhelming
> > majority of requests are contiguous, i.e. can be combined into a range by KVM,
> > and that handling any outliers by performing multiple exits to userspace will
> > provide sufficient performance.
> 
> That does tend to be the case. We won't have as much granularity with
> the per-entry error codes, but KVM_SET_MEMORY_ATTRIBUTES would be
> expected to be for the entire range anyway, and if that fails for
> whatever reason then we KVM_BUG_ON() anyway. We do have to have handling
> for cases where the entries aren't contiguous however, which would
> involve multiple KVM_EXIT_HYPERCALLs until everything is satisfied. But
> not a huge deal since it doesn't seem to be a common case.

If it was less complex overall, I wouldn't be opposed to KVM marshalling everything
into a buffer, but I suspect it will be simpler to just have KVM loop until the
PSC request is complete.

> KVM_HC_MAP_GPA_RANGE seems like a nice option because we'd also have the
> flexibility to just issue that directly within a guest rather than
> relying on SNP/TDX specific hcalls. I don't know if that approach is
> practical for a real guest, but it could be useful for having re-usable
> guest code in KVM selftests that "just works" for all variants of
> SNP/TDX/sw-protected. (though we'd still want stuff that exercises
> SNP/TDX->KVM_HC_MAP_GPA_RANGE translation).
> 
> I think we'd there is some potential baggage there with the previous SEV
> live migration use cases. There's some potential that existing guest kernels
> will use it once it gets advertised and issue them alongside GHCB-based
> page-state changes. It might make sense to use one of the reserved bits
> to denote this flavor of KVM_HC_MAP_GPA_RANGE as being for
> hardware/software-protected VMs and not interchangeable with calls that
> were used for SEV live migration stuff.

I don't think I follow, what exactly wouldn't be interchangeable, and why?

> If this seems reasonable I'll give it a go and see what it looks like.
> 
> > 
> > And the non-MSR version that comes in later patch is a complete mess.  It kicks
> > the PSC out to userspace without *any* validation.  As I complained in the TDX
> > thread, that will create an unmaintable ABI for KVM.
> > 
> > KVM needs to have its own, well-defined ABI.  Splitting functionality between
> > KVM and userspace at seemingly random points is not maintainable.
> > 
> > E.g. if/when KVM supports UNSMASH, upgrading to the KVM would arguably break
> > userspace as PSC requests that previously exited would suddenly be handled by
> > KVM.  Maybe.  It's impossible to review this because there's no KVM ABI, KVM is
> > little more than a dumb pipe parroting information to userspace.
> 
> It leans on the GHCB spec to avoid re-inventing structs/documentation
> for things like Page State Change buffers, but do have some control
> as we want over how much we farm out versus lock into the KVM ABI. For
> instance the accompanying Documentation/ update mentions we only send a
> subset of GHCB requests that need to be handled by userspace, so we
> could handle SMASH/UNSMASH in KVM without breaking expectations (or if
> SMASH/UNSMASH were intermixed with PSCs, documentation that only PSC
> opcodes could be updated by userspace).
> 
> But I'm certainly not arguing it wouldn't be better to have a
> guest-agnostic alternative if we can reach an agreement on that, and
> KVM_HC_MAP_GPA_RANGE seems like it could work.

Yeah, I want to at least _try_ to achieve common ground, because the basic
functionality of all this stuff is the exact same.