Re: [ANNOUNCE] PUCK Notes - 2024.04.03 - TDX Upstreaming Strategy

Sean Christopherson <seanjc@xxxxxxxxxx> · Mon, 8 Apr 2024 18:37:31 -0700

On Mon, Apr 08, 2024, Rick P Edgecombe wrote:
> On Mon, 2024-04-08 at 15:36 -0700, Sean Christopherson wrote:
> > > Currently the values for the directly settable CPUID leafs come via a TDX
> > > specific init VM userspace API.
> > 
> > Is guest.MAXPHYADDR one of those?  If so, use that.
> 
> No it is not configurable. I'm looking into make it configurable, but it is not
> likely to happen before we were hoping to get basic support upstream.

Yeah, love me some hardware defined software.

> An alternative would be to have the KVM API peak at the value, and then
> discard it (not pass the leaf value to the TDX module). Not ideal.

Heh, I typed up this idea before reading ahead.  This has my vote.  Unless I'm
misreading where things are headed, using guest.MAXPHYADDR to communicate what
is essentially GPAW to the guest is about to become the de facto standard.

At that point, KVM can basically treat the current TDX module behavior as an
erratum, i.e. discarding guest.MAXPHYADDR becomes a workaround for a "CPU" bug,
not some goofy KVM quirk.

> Or have a dedicated GPAW field and expose the concept to userspace like
> Xiaoyao was talking about.

I'd prefer not to.  As above, it's not KVM's fault that the TDX module can't move
fast enough to adapt.

> > > So should we look at making the TDX side follow a
> > > KVM_GET_SUPPORTED_CPUID/KVM_SET_CPUID pattern for feature enablement? Or am
> > > I
> > > misreading general guidance out of this specific suggestion around GPAW? 
> > 
> > No?  Where I was going with that, is _if_ vCPUs can be created (in KVM) before
> > the GPAW is set (in the TDX module), then using vCPU0's guest.MAXPHYADDR tokkk
> > compute the desired GPAW may be the least awful solution, all things
> > considered.
> 
> Sorry, I was trying to uplevel the conversation to be about the general concept
> of matching TD configuration to CPUID bits. Let me try to articulate the problem
> a little better.
> 
> Today, KVM’s KVM_GET_SUPPORTED_CPUID is a way to specify which features are
> virtualizable by KVM. Communicating this via CPUID leaf values works for the
> most part, because CPUID is already designed to communicate which features are
> supported. But TDX has a different language to communicate which features are
> supported. That is special fields that are passed when creating a VM: XFAM
> (matching XCR0 features) and ATTRIBUTES (TDX specific flags for MSR based
> features like PKS, etc). So compared to KVM_GET_SUPPORTED_CPUID/KVM_SET_CPUID,
> the TDX module instead accepts only a few CPUID bits to be set directly by the
> VMM, and sets other CPUID leafs to match the configured features via XFAM and
> ATTRIBUTES.
> 
> There are also some bits/features that have fixed values. Which leafs are fixed
> and what the values are isn't something provided by any current TDX module API.
> Instead they are only known via documentation, which is subject to change. The
> queryable information is limited to communicating which bits are directly
> configurable. 

As I said in PUCK (and recorded in the notes), the fixed values should be provided
in a data format that is easily consumed by C code, so that KVM can report that
to userspace with

> So the current interface won't allow us to perfectly match the
> KVM_GET_SUPPORTED_CPUID/KVM_SET_CPUID. Even excluding the vm-scoped vs vcpu-
> scoped differences. However, we could try to match the general design a
> little better.

No, don't try to match KVM_GET_SUPPORTED_CPUID, it's a terrible API that no one
likes.  The only reason we haven't replaced is because no one has come up with a
universally better idea.  For feature flags, communicating what KVM supports is
straightforward, mostly.  But for things like topology, communicating exactly what
KVM "supports" is much more difficult.

The TDX fixed bits are very different.  It's the TDX module, and thus KVM, saying
"here are the bits that you _must_ set to these exact values".

> Here we were discussing making gpaw configurable via a dedicated named field,
> but the suggestion is to instead include it in CPUID bits. The current API takes
> ATTRIBUTES as a dedicated field too. But there actually are CPUID bits for some
> of those features. Those CPUID bits are controlled instead via the associated
> ATTRIBUTES. So we could expose such features via CPUID as well. Userspace would
> for example, pass the PKS CPUID bit in, and KVM would see it and configure PKS
> via the ATTRIBUTES bit.
> 
> So what I was looking to understand is, what is the enthusiasm for generally
> continuing to use CPUID has the main method for specifying which features should
> be enabled/virtualized, if we can't match the existing
> KVM_GET_SUPPORTED_CPUID/KVM_SET_CPUID APIs. Is the hope just to make userspace's
> code more unified between TDX and normal VMs?

I need to look at the TDX code more to form an (updated) opinion.  IIRC, my opinion
from four years ago was to use ATTRIBUTES and then force CPUID to match.  Whether
or not that's still my preferred approach probably depends on how many, and what,
things are shoved into attributes.