RE: [PATCH v5 17/65] i386/tdx: Adjust the supported CPUID based on TDX restrictions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




>-----Original Message-----
>From: Li, Xiaoyao <xiaoyao.li@xxxxxxxxx>
>Subject: Re: [PATCH v5 17/65] i386/tdx: Adjust the supported CPUID based
>on TDX restrictions
>
>On 5/31/2024 4:47 PM, Duan, Zhenzhong wrote:
>>
>> On 2/29/2024 2:36 PM, Xiaoyao Li wrote:
>>> According to Chapter "CPUID Virtualization" in TDX module spec, CPUID
>>> bits of TD can be classified into 6 types:
>>>
>>> ------------------------------------------------------------------------
>>> 1 | As configured | configurable by VMM, independent of native value;
>>> ------------------------------------------------------------------------
>>> 2 | As configured | configurable by VMM if the bit is supported natively
>>>      (if native)   | Otherwise it equals as native(0).
>>> ------------------------------------------------------------------------
>>> 3 | Fixed         | fixed to 0/1
>>> ------------------------------------------------------------------------
>>> 4 | Native        | reflect the native value
>>> ------------------------------------------------------------------------
>>> 5 | Calculated    | calculated by TDX module.
>>> ------------------------------------------------------------------------
>>> 6 | Inducing #VE  | get #VE exception
>>> ------------------------------------------------------------------------
>>>
>>> Note:
>>> 1. All the configurable XFAM related features and TD attributes related
>>>     features fall into type #2. And fixed0/1 bits of XFAM and TD
>>>     attributes fall into type #3.
>>>
>>> 2. For CPUID leaves not listed in "CPUID virtualization Overview" table
>>>     in TDX module spec, TDX module injects #VE to TDs when those are
>>>     queried. For this case, TDs can request CPUID emulation from VMM via
>>>     TDVMCALL and the values are fully controlled by VMM.
>>>
>>> Due to TDX module has its own virtualization policy on CPUID bits, it
>>> leads
>>> to what reported via KVM_GET_SUPPORTED_CPUID diverges from the
>supported
>>> CPUID bits for TDs. In order to keep a consistent CPUID configuration
>>> between VMM and TDs. Adjust supported CPUID for TDs based on TDX
>>> restrictions.
>>>
>>> Currently only focus on the CPUID leaves recognized by QEMU's
>>> feature_word_info[] that are indexed by a FeatureWord.
>>>
>>> Introduce a TDX CPUID lookup table, which maintains 1 entry for each
>>> FeatureWord. Each entry has below fields:
>>>
>>>   - tdx_fixed0/1: The bits that are fixed as 0/1;
>>>
>>>   - depends_on_vmm_cap: The bits that are configurable from the view of
>>>                TDX module. But they requires emulation of VMM
>>>                when configured as enabled. For those, they are
>>>                not supported if VMM doesn't report them as
>>>                supported. So they need be fixed up by checking
>>>                if VMM supports them.
>>
>> Previously I thought bits configurable for TDX module are emulated by
>> TDX module,
>>
>> it looks not. Just curious why doesn't those bits belong to "Inducing
>> #VE" type?
>
>Because when TD guest queries this type of CPUID leaf, it doesn't get #VE.
>
>The bits in this category are free to be configured as on/off by VMM and
>passed to TDX module via TD_PARAM. Once they get configured, they are
>queried directly by TD guest without getting #VE.
>
>The problem is whether VMM allows them to be configured freely. E.g.,
>for features TME and PCONFIG, they are configurable. However when VMM
>configures them to 1, VMM needs to provide the support of related MSRs
>of them. If VMM cannot provide such support, VMM should be allow user to
>configured to 1.
>
>That's why we have this kind of type.
>
>BTW, we are going to abondan the solution that let QEMU to maintain the
>CPUID configurability table in this series. Next version we will come up
>with
>
>> Then guest can get KVM reported capabilities with tdvmcall directly.
>>
>>>
>>>   - inducing_ve: TD gets #VE when querying this CPUID leaf. The result is
>>>                  totally configurable by VMM.
>>>
>>>   - supported_on_ve: It's valid only when @inducing_ve is true. It
>>> represents
>>>             the maximum feature set supported that be emulated
>>>             for TDs.
>> This is never used together with depends_on_vmm_cap, maybe one
>variable
>> is enough?
>>>
>>> By applying TDX CPUID lookup table and TDX capabilities reported from
>>> TDX module, the supported CPUID for TDs can be obtained from following
>>> steps:
>>>
>>> - get the base of VMM supported feature set;
>>>
>>> - if the leaf is not a FeatureWord just return VMM's value without
>>>    modification;
>>>
>>> - if the leaf is an inducing_ve type, applying supported_on_ve mask and
>>>    return;
>>>
>>> - include all native bits, it covers type #2, #4, and parts of type #1.
>>>    (it also includes some unsupported bits. The following step will
>>>     correct it.)
>>>
>>> - apply fixed0/1 to it (it covers #3, and rectifies the previous step);
>>>
>>> - add configurable bits (it covers the other part of type #1);
>>>
>>> - fix the ones in vmm_fixup;
>>>
>>> (Calculated type is ignored since it's determined at runtime).
>>>
>>> Co-developed-by: Chenyi Qiang <chenyi.qiang@xxxxxxxxx>
>>> Signed-off-by: Chenyi Qiang <chenyi.qiang@xxxxxxxxx>
>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxx>
>>> ---
>>>   target/i386/cpu.h     |  16 +++
>>>   target/i386/kvm/kvm.c |   4 +
>>>   target/i386/kvm/tdx.c | 263
>++++++++++++++++++++++++++++++++++++++++++
>>>   target/i386/kvm/tdx.h |   3 +
>>>   4 files changed, 286 insertions(+)
>>>
>>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>>> index 952174bb6f52..7bd604f802a1 100644
>>> --- a/target/i386/cpu.h
>>> +++ b/target/i386/cpu.h
>>> @@ -787,6 +787,8 @@ uint64_t
>>> x86_cpu_get_supported_feature_word(FeatureWord w,
>>>   /* Support RDFSBASE/RDGSBASE/WRFSBASE/WRGSBASE */
>>>   #define CPUID_7_0_EBX_FSGSBASE          (1U << 0)
>>> +/* Support for TSC adjustment MSR 0x3B */
>>> +#define CPUID_7_0_EBX_TSC_ADJUST        (1U << 1)
>>>   /* Support SGX */
>>>   #define CPUID_7_0_EBX_SGX               (1U << 2)
>>>   /* 1st Group of Advanced Bit Manipulation Extensions */
>>> @@ -805,8 +807,12 @@ uint64_t
>>> x86_cpu_get_supported_feature_word(FeatureWord w,
>>>   #define CPUID_7_0_EBX_INVPCID           (1U << 10)
>>>   /* Restricted Transactional Memory */
>>>   #define CPUID_7_0_EBX_RTM               (1U << 11)
>>> +/* Cache QoS Monitoring */
>>> +#define CPUID_7_0_EBX_PQM               (1U << 12)
>>>   /* Memory Protection Extension */
>>>   #define CPUID_7_0_EBX_MPX               (1U << 14)
>>> +/* Resource Director Technology Allocation */
>>> +#define CPUID_7_0_EBX_RDT_A             (1U << 15)
>>>   /* AVX-512 Foundation */
>>>   #define CPUID_7_0_EBX_AVX512F           (1U << 16)
>>>   /* AVX-512 Doubleword & Quadword Instruction */
>>> @@ -862,10 +868,16 @@ uint64_t
>>> x86_cpu_get_supported_feature_word(FeatureWord w,
>>>   #define CPUID_7_0_ECX_AVX512VNNI        (1U << 11)
>>>   /* Support for VPOPCNT[B,W] and VPSHUFBITQMB */
>>>   #define CPUID_7_0_ECX_AVX512BITALG      (1U << 12)
>>> +/* Intel Total Memory Encryption */
>>> +#define CPUID_7_0_ECX_TME               (1U << 13)
>>>   /* POPCNT for vectors of DW/QW */
>>>   #define CPUID_7_0_ECX_AVX512_VPOPCNTDQ  (1U << 14)
>>> +/* Placeholder for bit 15 */
>>> +#define CPUID_7_0_ECX_FZM               (1U << 15)
>>>   /* 5-level Page Tables */
>>>   #define CPUID_7_0_ECX_LA57              (1U << 16)
>>> +/* MAWAU for MPX */
>>> +#define CPUID_7_0_ECX_MAWAU             (31U << 17)
>>>   /* Read Processor ID */
>>>   #define CPUID_7_0_ECX_RDPID             (1U << 22)
>>>   /* Bus Lock Debug Exception */
>>> @@ -876,6 +888,8 @@ uint64_t
>>> x86_cpu_get_supported_feature_word(FeatureWord w,
>>>   #define CPUID_7_0_ECX_MOVDIRI           (1U << 27)
>>>   /* Move 64 Bytes as Direct Store Instruction */
>>>   #define CPUID_7_0_ECX_MOVDIR64B         (1U << 28)
>>> +/* ENQCMD and ENQCMDS instructions */
>>> +#define CPUID_7_0_ECX_ENQCMD            (1U << 29)
>>>   /* Support SGX Launch Control */
>>>   #define CPUID_7_0_ECX_SGX_LC            (1U << 30)
>>>   /* Protection Keys for Supervisor-mode Pages */
>>> @@ -893,6 +907,8 @@ uint64_t
>>> x86_cpu_get_supported_feature_word(FeatureWord w,
>>>   #define CPUID_7_0_EDX_SERIALIZE         (1U << 14)
>>>   /* TSX Suspend Load Address Tracking instruction */
>>>   #define CPUID_7_0_EDX_TSX_LDTRK         (1U << 16)
>>> +/* PCONFIG instruction */
>>> +#define CPUID_7_0_EDX_PCONFIG           (1U << 18)
>>>   /* Architectural LBRs */
>>>   #define CPUID_7_0_EDX_ARCH_LBR          (1U << 19)
>>>   /* AMX_BF16 instruction */
>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>> index 0e68e80f4291..389b631c03dd 100644
>>> --- a/target/i386/kvm/kvm.c
>>> +++ b/target/i386/kvm/kvm.c
>>> @@ -520,6 +520,10 @@ uint32_t
>kvm_arch_get_supported_cpuid(KVMState
>>> *s, uint32_t function,
>>>           ret |= 1U << KVM_HINTS_REALTIME;
>>>       }
>>> +    if (is_tdx_vm()) {
>>> +        tdx_get_supported_cpuid(function, index, reg, &ret);
>>> +    }
>>> +
>>>       return ret;
>>>   }
>>> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
>>> index 756058f2ed4a..85d96140b450 100644
>>> --- a/target/i386/kvm/tdx.c
>>> +++ b/target/i386/kvm/tdx.c
>>> @@ -15,11 +15,129 @@
>>>   #include "qemu/error-report.h"
>>>   #include "qapi/error.h"
>>>   #include "qom/object_interfaces.h"
>>> +#include "standard-headers/asm-x86/kvm_para.h"
>>>   #include "sysemu/kvm.h"
>>> +#include "sysemu/sysemu.h"
>>>   #include "hw/i386/x86.h"
>>>   #include "kvm_i386.h"
>>>   #include "tdx.h"
>>> +#include "../cpu-internal.h"
>>> +
>>> +#define TDX_SUPPORTED_KVM_FEATURES  ((1U <<
>KVM_FEATURE_NOP_IO_DELAY)
>>> | \
>>> +                                     (1U << KVM_FEATURE_PV_UNHALT) | \
>>> +                                     (1U << KVM_FEATURE_PV_TLB_FLUSH)
>>> | \
>>> +                                     (1U << KVM_FEATURE_PV_SEND_IPI) | \
>>> +                                     (1U << KVM_FEATURE_POLL_CONTROL)
>>> | \
>>> +                                     (1U <<
>>> KVM_FEATURE_PV_SCHED_YIELD) | \
>>> +                                     (1U <<
>>> KVM_FEATURE_MSI_EXT_DEST_ID))
>>> +
>>> +typedef struct KvmTdxCpuidLookup {
>>> +    uint32_t tdx_fixed0;
>>> +    uint32_t tdx_fixed1;
>>> +
>>> +    /*
>>> +     * The CPUID bits that are configurable from the view of TDX module
>>> +     * but require VMM's support when wanting to enable them.
>>> +     *
>>> +     * For those bits, they cannot be enabled if VMM (KVM/QEMU)
>>> doesn't support
>>> +     * them.
>>> +     */
>>> +    uint32_t depends_on_vmm_cap;
>>> +
>>> +    bool inducing_ve;
>>> +    /*
>>> +     * The maximum supported feature set for given inducing-#VE leaf.
>>> +     * It's valid only when .inducing_ve is true.
>>> +     */
>>> +    uint32_t supported_value_on_ve;
>>> +} KvmTdxCpuidLookup;
>>> +
>>> + /*
>>> +  * QEMU maintained TDX CPUID lookup tables, which reflects how
>>> CPUIDs are
>>> +  * virtualized for guest TDs based on "CPUID virtualization" of TDX
>>> spec.
>>> +  *
>>> +  * Note:
>>> +  *
>>> +  * This table will be updated runtime by tdx_caps reported by KVM.
>>> +  *
>>> +  */
>>> +static KvmTdxCpuidLookup tdx_cpuid_lookup[FEATURE_WORDS] = {
>>> +    [FEAT_1_EDX] = {
>>> +        .tdx_fixed0 =
>>> +            BIT(10) /* Reserved */ | BIT(20) /* Reserved */ |
>>> CPUID_IA64,
>>> +        .tdx_fixed1 =
>>> +            CPUID_MSR | CPUID_PAE | CPUID_MCE | CPUID_APIC |
>>> +            CPUID_MTRR | CPUID_MCA | CPUID_CLFLUSH | CPUID_DTS,
>>> +        .depends_on_vmm_cap =
>>> +            CPUID_ACPI | CPUID_PBE,
>>> +    },
>>> +    [FEAT_1_ECX] = {
>>> +        .tdx_fixed0 =
>>> +            CPUID_EXT_VMX | CPUID_EXT_SMX | BIT(16) /* Reserved */,
>>> +        .tdx_fixed1 =
>>> +            CPUID_EXT_CX16 | CPUID_EXT_PDCM | CPUID_EXT_X2APIC |
>>> +            CPUID_EXT_AES | CPUID_EXT_XSAVE | CPUID_EXT_RDRAND |
>>> +            CPUID_EXT_HYPERVISOR,
>>> +        .depends_on_vmm_cap =
>>> +            CPUID_EXT_EST | CPUID_EXT_TM2 | CPUID_EXT_XTPR |
>>> CPUID_EXT_DCA,
>>> +    },
>>> +    [FEAT_8000_0001_EDX] = {
>>> +        .tdx_fixed1 =
>>> +            CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB | CPUID_EXT2_RDTSCP
>|
>>> +            CPUID_EXT2_LM,
>>> +    },
>>> +    [FEAT_7_0_EBX] = {
>>> +        .tdx_fixed0 =
>>> +            CPUID_7_0_EBX_TSC_ADJUST | CPUID_7_0_EBX_SGX |
>>> CPUID_7_0_EBX_MPX,
>>> +        .tdx_fixed1 =
>>> +            CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_RTM |
>>> +            CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_SMAP |
>>> +            CPUID_7_0_EBX_CLFLUSHOPT | CPUID_7_0_EBX_CLWB |
>>> +            CPUID_7_0_EBX_SHA_NI,
>>> +        .depends_on_vmm_cap =
>>> +            CPUID_7_0_EBX_PQM | CPUID_7_0_EBX_RDT_A,
>>> +    },
>>> +    [FEAT_7_0_ECX] = {
>>> +        .tdx_fixed0 =
>>> +            CPUID_7_0_ECX_FZM | CPUID_7_0_ECX_MAWAU |
>>> +            CPUID_7_0_ECX_ENQCMD | CPUID_7_0_ECX_SGX_LC,
>>> +        .tdx_fixed1 =
>>> +            CPUID_7_0_ECX_MOVDIR64B |
>CPUID_7_0_ECX_BUS_LOCK_DETECT,
>>> +        .depends_on_vmm_cap =
>>> +            CPUID_7_0_ECX_TME,
>>> +    },
>>> +    [FEAT_7_0_EDX] = {
>>> +        .tdx_fixed1 =
>>> +            CPUID_7_0_EDX_SPEC_CTRL |
>CPUID_7_0_EDX_ARCH_CAPABILITIES |
>>> +            CPUID_7_0_EDX_CORE_CAPABILITY |
>>> CPUID_7_0_EDX_SPEC_CTRL_SSBD,
>>> +        .depends_on_vmm_cap =
>>> +            CPUID_7_0_EDX_PCONFIG,
>>> +    },
>>> +    [FEAT_8000_0008_EBX] = {
>>> +        .tdx_fixed0 =
>>> +            ~CPUID_8000_0008_EBX_WBNOINVD,
>>> +        .tdx_fixed1 =
>>> +            CPUID_8000_0008_EBX_WBNOINVD,
>>> +    },
>>> +    [FEAT_XSAVE] = {
>>> +        .tdx_fixed1 =
>>> +            CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
>>> +            CPUID_XSAVE_XSAVES,
>>> +    },
>>> +    [FEAT_6_EAX] = {
>>> +        .inducing_ve = true,
>>> +        .supported_value_on_ve = CPUID_6_EAX_ARAT,
>>> +    },
>>> +    [FEAT_8000_0007_EDX] = {
>>> +        .inducing_ve = true,
>>> +        .supported_value_on_ve = -1U,
>>> +    },
>>> +    [FEAT_KVM] = {
>>> +        .inducing_ve = true,
>>> +        .supported_value_on_ve = TDX_SUPPORTED_KVM_FEATURES,
>>> +    },
>>> +};
>>>   static TdxGuest *tdx_guest;
>>> @@ -31,6 +149,151 @@ bool is_tdx_vm(void)
>>>       return !!tdx_guest;
>>>   }
>>> +static inline uint32_t host_cpuid_reg(uint32_t function,
>>> +                                      uint32_t index, int reg)
>>> +{
>>> +    uint32_t eax, ebx, ecx, edx;
>>> +    uint32_t ret = 0;
>>> +
>>> +    host_cpuid(function, index, &eax, &ebx, &ecx, &edx);
>>> +
>>> +    switch (reg) {
>>> +    case R_EAX:
>>> +        ret = eax;
>>> +        break;
>>> +    case R_EBX:
>>> +        ret = ebx;
>>> +        break;
>>> +    case R_ECX:
>>> +        ret = ecx;
>>> +        break;
>>> +    case R_EDX:
>>> +        ret = edx;
>>> +        break;
>>> +    }
>>> +    return ret;
>>> +}
>>> +
>>> +/*
>>> + * get the configurable cpuid bits (can be set to 0 or 1) reported by
>>> TDX module
>>> + * from tdx_caps.
>>> + */
>>> +static inline uint32_t tdx_cap_cpuid_config(uint32_t function,
>>> +                                            uint32_t index, int reg)
>>> +{
>>> +    struct kvm_tdx_cpuid_config *cpuid_c;
>>> +    int ret = 0;
>>> +    int i;
>>> +
>>> +    if (tdx_caps->nr_cpuid_configs <= 0) {
>>> +        return ret;
>>> +    }
>>> +
>>> +    for (i = 0; i < tdx_caps->nr_cpuid_configs; i++) {
>>> +        cpuid_c = &tdx_caps->cpuid_configs[i];
>>> +        /* 0xffffffff in sub_leaf means the leaf doesn't require a
>>> sublesf */
>>> +        if (cpuid_c->leaf == function &&
>>> +            (cpuid_c->sub_leaf == 0xffffffff || cpuid_c->sub_leaf ==
>>> index)) {
>>> +            switch (reg) {
>>> +            case R_EAX:
>>> +                ret = cpuid_c->eax;
>>> +                break;
>>> +            case R_EBX:
>>> +                ret = cpuid_c->ebx;
>>> +                break;
>>> +            case R_ECX:
>>> +                ret = cpuid_c->ecx;
>>> +                break;
>>> +            case R_EDX:
>>> +                ret = cpuid_c->edx;
>>> +                break;
>>> +            default:
>>> +                return 0;
>>> +            }
>>> +        }
>>> +    }
>>> +    return ret;
>>> +}
>>> +
>>> +static FeatureWord get_cpuid_featureword_index(uint32_t function,
>>> +                                               uint32_t index, int reg)
>>> +{
>>> +    FeatureWord w;
>>> +
>>> +    for (w = 0; w < FEATURE_WORDS; w++) {
>>> +        FeatureWordInfo *f = &feature_word_info[w];
>>> +
>>> +        if (f->type == MSR_FEATURE_WORD || f->cpuid.eax != function ||
>>> +            f->cpuid.reg != reg ||
>>> +            (f->cpuid.needs_ecx && f->cpuid.ecx != index)) {
>>> +            continue;
>>> +        }
>>> +
>>> +        return w;
>>> +    }
>>> +
>>> +    return w;
>>> +}
>>> +
>>> +/*
>>> + * TDX supported CPUID varies from what KVM reports. Adjust the
>>> result by
>>> + * applying the TDX restrictions.
>>> + */
>>> +void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int
>reg,
>>> +                             uint32_t *ret)
>>> +{
>>> +    /*
>>> +     * it's KVMM + QEMU 's capabilities of what CPUID bits is
>>> supported or
>>> +     * can be emulated as supported.
>>> +     */
>>> +    uint32_t vmm_cap = *ret;
>>> +    FeatureWord w;
>>> +
>>> +    /* Only handle features leaves that recognized by
>>> feature_word_info[] */
>>> +    w = get_cpuid_featureword_index(function, index, reg);
>>> +    if (w == FEATURE_WORDS) {
>>> +        return;
>>> +    }
>>> +
>>> +    if (tdx_cpuid_lookup[w].inducing_ve) {
>>> +        *ret &= tdx_cpuid_lookup[w].supported_value_on_ve;
>>> +        return;
>>> +    }
>>> +
>>> +    /*
>>> +     * Include all the native bits as first step. It covers types
>>> +     * - As configured (if native)
>>> +     * - Native
>>> +     * - XFAM related and Attributes realted
>> s/realted/related
>>> +     *
>>> +     * It also has side effect to enable unsupported bits, e.g., the
>>> +     * bits of "fixed0" type while present natively. It's safe because
>>> +     * the unsupported bits will be masked off by .fixed0 later.
>>> +     */
>>> +    *ret |= host_cpuid_reg(function, index, reg);
>>
>> Looks KVM capabilities are merged with native bits, is this intentional?
>
>yes, if we change the order, it would be more clear for you I guess.
>
>	host_cpuid_reg() | kvm_capabilities
>
>The base is host's native value, while any bit that absent from native
>but KVM can emulate is also added to base.

Imagine there is a 'type native' bit that's absent from native but KVM emulated,
With above code we pass 1 to tdx module but it wants native 0, is it an issue?

Thanks
Zhenzhong




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux