This is a proposal for a potential future TDX Module feature to assist
QEMU/KVM in configuring CPUID leafs for TD guests. It is only in the
idea stage and not currently being implemented. We are looking for
comments on the suitability for QEMU/KVM.
# Background
To correctly virtualize CPUID for TD, the VMM needs to understand the
behavior of CPUID configuration for each CPUID bit, including whether
the bit can be configured by the VMM and what the allowed value is.
There is an interface to query the CPUID bit information after the TD
has been configured. However, this interface does not work before the TD
is configured. The TDX module, along with its release, provides a
separate JSON format file, cpuid_virtualization.json, for CPUID
virtualization information. This file can be used by the VMM even before
the TD is configured. The TDX module also provides an interface to query
some limited CPUID information, including:
- The configurability of a subset of CPUIDs via global metadata
CPUID_CONFIG_VALUES.
- The 'fixed0' and 'fixed1' bits of ATTRIBUTES and XFAM via global
metadata. The VMM can infer the 'configurable' bits related to
ATTRIBUTES/XFAM indirectly (the bits that are neither 'fixed0' nor
'fixed1' are 'configurable').
For the remaining CPUID bits not covered by the above two categories, no
TDX module query interface exists.
# Problem
While the VMM can use the JSON format CPUID information and may embed or
translate that information into the code, it may face several challenges:
- The JSON file varies with each TDX module release, which can
complicate the VMM code. Additionally, depending on its own needs, the
VMM may require more information than what is provided in the JSON file.
- The JSON format cannot be easily parsed with low-level programming
languages like C, which is typically used to write VMMs.
There was objection from KVM community for parsing the JSON and requests
for a more friendly interface to query CPUID information for each
specific TDX module.[0][1]
# Analysis
There are many virtualization types defined for single bit or bitfields
in JSON file, e.g., 12 types in TDX 1.5.06:
- fixed
- configured
- configured & native
- XFAM & native
- XFAM & configured & native
- attributes & native
- attributes & configured & native
- CPUID_enabled & native
- attributes & CPUID_enabled & native
- attributes & CPUID_enabled & configured & native
- calculated
- special
And more types are getting added as TDX evolves.
Though so many types defined, for a single bit, it can only be one of three:
- fixed0
- fixed1
- configurable
For example:
1. For type "configured & native", the bit is “fixed0” bit if the native
value is 0, and the “configurable” bit if native value is 1.
2. For type "XFAM & native",
a) the CPUID is “fixed0” if the corresponding XFAM bit is reported
in XFAM_FIXED0, or the native value is 0;
b) the CPUID bit is ‘fixed1’ if the corresponding XFAM bit is set in
XFAM_FIXED1;
c) otherwise, the CPUID is ‘configurable’ (indirectly by TD_PRRAMS.XFAM)
# Proposal
Current TDX module provides interface to report the “configurable” bits
via global metadata CPUID_CONFIG_VALUES directly or via global metadata
ATTRIBUTES/XFAM_fixed0/1 indirectly. But it lacks the interface to
report the “fixed0” and “fixed1” bits generally (it only reports the
fixed bits for ATTRIBUTES and XFAM).
We propose to add two new global metadata fields, CPUID_FIXED0_BITS and
CPUID_FIXED1_BITS, for “fixed0” and “fixed1” bits information respectively.
The encoding of the two fields uses the same format as TDCS field
CPUID_VALUES:
Field code is composed as follows:
- Bits 31:17 Reserved, must be 0
- Bit 16 Leaf number bit 31
- Bits 15:9 Leaf number bit 6:0
- Bit 8 Sub-leaf not applicable flag
- Bits 7:1 Sub-leaf number bits 6:0
- Bit 0 Element index within field
The same for returned result:
- Element 0[31:0]: EAX
- Element 0[63:32]: EBX
- Element 1[31:0]: ECX
- Element 1[63:32]: EDX
For CPUID_FIXED0_BITS, any bit in E[A,B,C,D]X is 0, means the bit is fixed0.
For CPUID_FIXED1_BITS, any bit in E[A,B,C,D]X is 1, means the bit is fixed1.
# Interaction with TDX_FEATURES0.VE_REDUCTION
TDX introduces a new feature VE_REDUCTION[2]. From the perspective of
host VMM, VE_REDUCTION turns several CPUID bits from fixed1 to
configurable, e.g., MTRR, MCA, MCE, etc. However, from the perspective
of TD guest, it’s an opt-in feature. The actual value seen by TD guest
depends on multiple factors: 1). If TD guest enables REDUCE_VE in
TDCS.TD_CTLS, 2) TDCS.FEATURE_PARAVIRT_CTRL, 3) CPUID value configured
by host VMM via TD_PARAMS.CPUID_CONFIG[]. (Please refer to latest TDX
1.5 spec for more details.)
Since host VMM has no idea on the setting of 1) and 2) when creating the
TD. We make the design to treat them as configurable bits and the global
metadata interface doesn’t report them as fixed1 bits for simplicity.
Host VMM must be aware itself that the value of these VE_REDUCTION
related CPUID bits might not be what it configures. The actual value
seen by TD guest also depends on the guest enabling and configuration of
VE_REDUCTION.
# POC
We did a POC in QEMU to verify the fixed0/1 data by such an interface is
enough for userspace to validate and generate a supported vcpu model for
TD guest.[3]
It retrieves the “fixed” type in JSON file and hardcodes them into two
arrays, tdx_fixed0_bits and tdx_fixed1_bits. Note, it doesn’t handle the
other types than “fixed” because 1) just a few of them falls into fixed0
or fixed1 and 2) turning them into fixed0 or fixed0 needs to check
various condition which complicates the POC. And in the POC it uses
value 1 in tdx_fixed0_bits for fixed0 bits, while the proposed metadata
interface uses value 0 to indicate fixed0 bits.
With the hardcoded information, VMM can validate the TD configuration
requested from user early by checking whether a feature requested from
users is allowed to be enabled and is allowed to be disabled.
When TDX module provides fixed0 and fixed1 via global metadata, QEMU can
change to requested them from KVM to replace the hardcoded one.
[0] https://lore.kernel.org/all/ZhVdh4afvTPq5ssx@xxxxxxxxxx/
[1] https://lore.kernel.org/all/ZhVsHVqaff7AKagu@xxxxxxxxxx/
[2] https://cdrdv2.intel.com/v1/dl/getContent/733575
[3]
https://lore.kernel.org/qemu-devel/20241105062408.3533704-49-xiaoyao.li@xxxxxxxxx/