Re: [PATCH v4 6/6] KVM: x86: Refactor __kvm_emulate_hypercall() into a macro

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 12, 2024, Adrian Hunter wrote:
> On 10/12/24 22:03, Sean Christopherson wrote:
> > On Tue, Dec 10, 2024, Paolo Bonzini wrote:
> >> On 11/28/24 09:38, Adrian Hunter wrote:
> >>>
> >>> For TDX, there is an RFC relating to using descriptively
> >>> named parameters instead of register names for tdh_vp_enter():
> >>>
> >>> 	https://lore.kernel.org/all/fa817f29-e3ba-4c54-8600-e28cf6ab1953@xxxxxxxxx/
> >>>
> >>> Please do give some feedback on that approach.  Note we
> >>> need both KVM and x86 maintainer approval for SEAMCALL
> >>> wrappers like tdh_vp_enter().
> >>>
> >>> As proposed, that ends up with putting the values back into
> >>> vcpu->arch.regs[] for __kvm_emulate_hypercall() which is not
> >>> pretty:
> >>
> >> If needed we can revert this patch, it's not a big problem.
> > 
> > I don't care terribly about the SEAMCALL interfaces.  I have opinions on what
> > would I think would be ideal, but I can live with whatever.
> > 
> > What I do deeply care about though is consistency within KVM, across vendors and
> > VM flavors.  And that means that guest registers absolutely need to be captured in
> > vcpu->arch.regs[].
> 
> In general, TDX host VMM does not know what guest register values are.
> 
> This case, where some GPRs are passed to the host VMM via arguments of the
> TDG.VP.VMCALL TDCALL, is really just a side effect of the choice of argument
> passing rather than any attempt to share guest registers with the host VMM.
> 
> It could be regarded as more consistent to never use vcpu->arch.regs[] for
> confidential guests.

SEV-ES+ marshalls data to/from the GHCB to KVM's register array, because the GHCB
spec was intentionally crafted to allow hypervisors to reuse exit-handling code.
Granted, that's only for R{A,B,C,D}X and RSI, but the other GPRs should never be
used and thus their data is irrelevant.

Which applies to TDX as well.  For regs[], it's really only TDVMCALL that I care
about, i.e. cases where GPRs hold guest values, versus things like EXIT_QUALIFICATION
where the GPR is simply TDX's way of communicating information to the hypervisor.

Argh.  The reason I care about putting vCPU state into regs[] is because it helps
share code between vendors.  Looking at kvm-coco-queue, TDX support wandered in
the opposite direction.  E.g. TDX rolls its own RDMSR, WRMSR, CPUID, and HYPERCALL
implementations, which is quite frustrating.  Ditto for things like EXIT_REASON,
EXIT_QUALIFICATION, EXIT_INTR_INFO, etc.

For EXIT_REASON in particular, I think maintaining the guest-requested exit reason
(via TDMVCALL) in a separate field is a mistake.  Readers shouldn't have to care
that a HLT exit technically was requested via TDVMCALL.  If KVM instead immediately
morphs the requested exit reason to KVM's tracked exit_reason, then there's no need
to deal with the TDVMCALL layer in flows that don't care.  The only danger is a
collision with a EXIT_REASON_EPT_MISCONFIG from the TDX module, but that's easy
enough to handle.
 
And even where TDX and VMX have shared some code, IMO it doesn't go far enough.
E.g. having vcpu_tdx and vcpu_vmx open code their own version of the posted
interrupt fields, just to avoid minimal churn in the VMX code, is beyond gross.

Even concepts like guest_state_loaded are unnecessarily different for TDX.  Yes,
I get that that host state doesn't need to be reloaded if KVM doesn't actually
enter the guest.  But holy moly, we're talking about avoid _one_ WRMSR in an
extremely rare path (late abort of entry), at the cost of making TDX frustratingly
different from VMX.

Making TDX look more like everything else isn't just about code sharing.  It's also
about providing a familiar setting so that readers who know almost nothing about
TDX can find their way around without having to effectively learn an entirely new
"architecture" *and* code base.

Hacking around, I think the attached half-baked diff will provide a middle ground
for the regs[] vs. struct/union issue.  The basic gist is to essentially treat
TDX's register ABI as a faster version of VMREAD/VMWRITE, e.g. marshall state
to/from the appropriate x86 registers as needed.  That way, regs[] holds the correct
state and so TDX can reuse much of KVM's existing code verbatim, while allowing
the kernel's VP_ENTER API to evolve independently.





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux