On Thu, Dec 12, 2024, Adrian Hunter wrote: > On 10/12/24 22:03, Sean Christopherson wrote: > > On Tue, Dec 10, 2024, Paolo Bonzini wrote: > >> On 11/28/24 09:38, Adrian Hunter wrote: > >>> > >>> For TDX, there is an RFC relating to using descriptively > >>> named parameters instead of register names for tdh_vp_enter(): > >>> > >>> https://lore.kernel.org/all/fa817f29-e3ba-4c54-8600-e28cf6ab1953@xxxxxxxxx/ > >>> > >>> Please do give some feedback on that approach. Note we > >>> need both KVM and x86 maintainer approval for SEAMCALL > >>> wrappers like tdh_vp_enter(). > >>> > >>> As proposed, that ends up with putting the values back into > >>> vcpu->arch.regs[] for __kvm_emulate_hypercall() which is not > >>> pretty: > >> > >> If needed we can revert this patch, it's not a big problem. > > > > I don't care terribly about the SEAMCALL interfaces. I have opinions on what > > would I think would be ideal, but I can live with whatever. > > > > What I do deeply care about though is consistency within KVM, across vendors and > > VM flavors. And that means that guest registers absolutely need to be captured in > > vcpu->arch.regs[]. > > In general, TDX host VMM does not know what guest register values are. > > This case, where some GPRs are passed to the host VMM via arguments of the > TDG.VP.VMCALL TDCALL, is really just a side effect of the choice of argument > passing rather than any attempt to share guest registers with the host VMM. > > It could be regarded as more consistent to never use vcpu->arch.regs[] for > confidential guests. SEV-ES+ marshalls data to/from the GHCB to KVM's register array, because the GHCB spec was intentionally crafted to allow hypervisors to reuse exit-handling code. Granted, that's only for R{A,B,C,D}X and RSI, but the other GPRs should never be used and thus their data is irrelevant. Which applies to TDX as well. For regs[], it's really only TDVMCALL that I care about, i.e. cases where GPRs hold guest values, versus things like EXIT_QUALIFICATION where the GPR is simply TDX's way of communicating information to the hypervisor. Argh. The reason I care about putting vCPU state into regs[] is because it helps share code between vendors. Looking at kvm-coco-queue, TDX support wandered in the opposite direction. E.g. TDX rolls its own RDMSR, WRMSR, CPUID, and HYPERCALL implementations, which is quite frustrating. Ditto for things like EXIT_REASON, EXIT_QUALIFICATION, EXIT_INTR_INFO, etc. For EXIT_REASON in particular, I think maintaining the guest-requested exit reason (via TDMVCALL) in a separate field is a mistake. Readers shouldn't have to care that a HLT exit technically was requested via TDVMCALL. If KVM instead immediately morphs the requested exit reason to KVM's tracked exit_reason, then there's no need to deal with the TDVMCALL layer in flows that don't care. The only danger is a collision with a EXIT_REASON_EPT_MISCONFIG from the TDX module, but that's easy enough to handle. And even where TDX and VMX have shared some code, IMO it doesn't go far enough. E.g. having vcpu_tdx and vcpu_vmx open code their own version of the posted interrupt fields, just to avoid minimal churn in the VMX code, is beyond gross. Even concepts like guest_state_loaded are unnecessarily different for TDX. Yes, I get that that host state doesn't need to be reloaded if KVM doesn't actually enter the guest. But holy moly, we're talking about avoid _one_ WRMSR in an extremely rare path (late abort of entry), at the cost of making TDX frustratingly different from VMX. Making TDX look more like everything else isn't just about code sharing. It's also about providing a familiar setting so that readers who know almost nothing about TDX can find their way around without having to effectively learn an entirely new "architecture" *and* code base. Hacking around, I think the attached half-baked diff will provide a middle ground for the regs[] vs. struct/union issue. The basic gist is to essentially treat TDX's register ABI as a faster version of VMREAD/VMWRITE, e.g. marshall state to/from the appropriate x86 registers as needed. That way, regs[] holds the correct state and so TDX can reuse much of KVM's existing code verbatim, while allowing the kernel's VP_ENTER API to evolve independently.