On 9/12/24 04:46, Xiaoyao Li wrote: > On 12/6/2024 10:40 PM, Adrian Hunter wrote: >> On 6/12/24 05:37, Xiaoyao Li wrote: >>> On 12/6/2024 1:31 AM, Adrian Hunter wrote: >>>> On 4/12/24 17:33, Xiaoyao Li wrote: >>>>> On 12/4/2024 7:55 PM, Adrian Hunter wrote: >>>>>> On 4/12/24 13:13, Chao Gao wrote: >>>>>>> On Wed, Dec 04, 2024 at 08:57:23AM +0200, Adrian Hunter wrote: >>>>>>>> On 4/12/24 08:37, Chao Gao wrote: >>>>>>>>> On Wed, Dec 04, 2024 at 08:18:32AM +0200, Adrian Hunter wrote: >>>>>>>>>> On 4/12/24 03:25, Chao Gao wrote: >>>>>>>>>>>> +#define TDX_FEATURE_TSX (__feature_bit(X86_FEATURE_HLE) | __feature_bit(X86_FEATURE_RTM)) >>>>>>>>>>>> + >>>>>>>>>>>> +static bool has_tsx(const struct kvm_cpuid_entry2 *entry) >>>>>>>>>>>> +{ >>>>>>>>>>>> + return entry->function == 7 && entry->index == 0 && >>>>>>>>>>>> + (entry->ebx & TDX_FEATURE_TSX); >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +static void clear_tsx(struct kvm_cpuid_entry2 *entry) >>>>>>>>>>>> +{ >>>>>>>>>>>> + entry->ebx &= ~TDX_FEATURE_TSX; >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +static bool has_waitpkg(const struct kvm_cpuid_entry2 *entry) >>>>>>>>>>>> +{ >>>>>>>>>>>> + return entry->function == 7 && entry->index == 0 && >>>>>>>>>>>> + (entry->ecx & __feature_bit(X86_FEATURE_WAITPKG)); >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +static void clear_waitpkg(struct kvm_cpuid_entry2 *entry) >>>>>>>>>>>> +{ >>>>>>>>>>>> + entry->ecx &= ~__feature_bit(X86_FEATURE_WAITPKG); >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +static void tdx_clear_unsupported_cpuid(struct kvm_cpuid_entry2 *entry) >>>>>>>>>>>> +{ >>>>>>>>>>>> + if (has_tsx(entry)) >>>>>>>>>>>> + clear_tsx(entry); >>>>>>>>>>>> + >>>>>>>>>>>> + if (has_waitpkg(entry)) >>>>>>>>>>>> + clear_waitpkg(entry); >>>>>>>>>>>> +} >>>>>>>>>>>> + >>>>>>>>>>>> +static bool tdx_unsupported_cpuid(const struct kvm_cpuid_entry2 *entry) >>>>>>>>>>>> +{ >>>>>>>>>>>> + return has_tsx(entry) || has_waitpkg(entry); >>>>>>>>>>>> +} >>>>>>>>>>> >>>>>>>>>>> No need to check TSX/WAITPKG explicitly because setup_tdparams_cpuids() already >>>>>>>>>>> ensures that unconfigurable bits are not set by userspace. >>>>>>>>>> >>>>>>>>>> Aren't they configurable? >>>>>>>>> >>>>>>>>> They are cleared from the configurable bitmap by tdx_clear_unsupported_cpuid(), >>>>>>>>> so they are not configurable from a userspace perspective. Did I miss anything? >>>>>>>>> KVM should check user inputs against its adjusted configurable bitmap, right? >>>>>>>> >>>>>>>> Maybe I misunderstand but we rely on the TDX module to reject >>>>>>>> invalid configuration. We don't check exactly what is configurable >>>>>>>> for the TDX Module. >>>>>>> >>>>>>> Ok, this is what I missed. I thought KVM validated user input and masked >>>>>>> out all unsupported features. sorry for this. >>>>>>> >>>>>>>> >>>>>>>> TSX and WAITPKG are not invalid for the TDX Module, but KVM >>>>>>>> must either support them by restoring their MSRs, or disallow >>>>>>>> them. This patch disallows them for now. >>>>>>> >>>>>>> Yes. I agree. what if a new feature (supported by a future TDX module) also >>>>>>> needs KVM to restore some MSRs? current KVM will allow it to be exposed (since >>>>>>> only TSX/WAITPKG are checked); then some MSRs may get corrupted. I may think >>>>>>> this is not a good design. Current KVM should work with future TDX modules. >>>>>> >>>>>> With respect to CPUID, I gather this kind of thing has been >>>>>> discussed, such as here: >>>>>> >>>>>> https://lore.kernel.org/all/ZhVsHVqaff7AKagu@xxxxxxxxxx/ >>>>>> >>>>>> and Rick and Xiaoyao are working on something. >>>>>> >>>>>> In general, I would expect a new TDX Module would advertise support for >>>>>> new features, but KVM would have to opt in to use them. >>>>>> >>>>> >>>>> There were discussion[1] on whether KVM to gatekeep the configurable/supported CPUIDs for TDX. I stand by Sean that KVM needs to do so. >>>>> >>>>> Regarding KVM opt in the new feature, KVM gatekeeps the CPUID bit that can be set by userspace is exactly the behavior of opt-in. i.e., for a given KVM, it only allows a CPUID set {S} to be configured by userspace, if new TDX module supports new feature X, it needs KVM to opt-in X by adding X to {S} so that X is allowed to be configured by userspace. >>>>> >>>>> Besides, I find current interface between KVM and userspace lacks the ability to tell userspace what bits are not supported by KVM. KVM_TDX_CAPABILITIES.cpuid doesn't work because it represents the configurable CPUIDs, not supported CPUIDs (I think we might rename it to configurable_cpuid to better reflect its meaning). So userspace has to hardcode that TSX and WAITPKG is not support itself. >>>> >>>> I don't follow why hardcoding would be necessary. >>>> >>>> If the leaf is represented in KVM_TDX_CAPABILITIES.cpuid, and >>>> the bits are 0 there, why would userspace try to set them to 1? >>> >>> Userspace doesn't set the bit to 1 in kvm_tdx_init_vm.cpuid, doesn't mean userspace wants the bit to be 0. >>> >>> Note, KVM_TDX_CAPABILITIES.cpuid reports the configurable bits. The value 0 of a bit in KVM_TDX_CAPABILITIES.cpuid means the bit is not configurable, not means the bit is unsupported. >> >> For features configurable by CPUID like TSX and WAITPKG, >> a value of 0 does mean unsupported, because the value >> has to be 1 to enable the feature. >> >> From the TDX Module Base spec: >> >> 11.18. Transactional Synchronization Extensions (TSX) >> The host VMM can enable TSX for a TD by configuring the following CPUID bits as enabled in the TD_PARAMS input to >> TDH.MNG.INIT: >> - CPUID(7,0).EBX[4] (HLE) >> - CPUID(7,0).EBX[11] (RTM) >> etc >> >> 11.19.4. WAITPKG: TPAUSE, UMONITOR and UMWAIT Instructions >> The host VMM may allow guest TDs to use the TPAUSE, UMONITOR and UMWAIT instructions, if the CPU supports them, >> by configuring the virtual value of CPUID(7,0).ECX[5] (WAITPKG) to 1 using the CPUID configuration table which is part >> the TD_PARAMS input to TDH.MNG.INIT. Enabling CPUID(7,0).ECX[5] also enables TD access to IA32_UMWAIT_CONTROL >> (MSR 0xE1). >> If not allowed, then TD execution of TPAUSE, UMONITOR or UMWAIT results in a #UD, and access to >> IA32_UMWAIT_CONTROL results in a #GP(0). >> >>> For kvm_tdx_init_vm.cpuid, >>> - if the corresponding bit is reported as 1 in KVM_TDX_CAPABILITIES.cpuid, then a value 0 in kvm_tdx_init_vm.cpuid means userspace wants to configure it as 0. >>> - if the corresponding bit is reported as 0 in KVM_TDX_CAPABILITIES.cpuid, then userspace has to pass a value 0 in kvm_tdx_init_vm.cpuid. But it doesn't mean the value of the bit will be 0. >>> >>> e.g., X2APIC bit is 0 in KVM_TDX_CAPABILITIES.cpuid, and it's also 0 in kvm_tdx_init_vm.cpuid, but TD guest sees a value of 1. In the view of QEMU, it maintains the bit of X2APIC as 1, and QEMU filters X2APIC bit when calling KVM_TDX_INIT_VM because X2APIC is not configurable. >>> >>> So when it comes to TSX and WAITPKG, QEMU also needs an interface to be informed that they are unsupported. Without the interface of fixed0 bits reported by KVM, QEMU needs to hardcode itself like [1]. The problem of hardcode is that it will conflict when future KVM allows them to be configurable. >> >> So TSX and WAITPKG support should be based on KVM_TDX_CAPABILITIES.cpuid, and not hardcoded. > > This requires Userspace to have the knowledge that "TSX and WAITPKG is supposed to be configurable and reported as 1 in KVM_TDX_CAPABILITIES.cpuid in normal case". And if userspace gets a value 0 of TSX and Waitpkg in KVM_TDX_CAPABILITIES.cpuid, it means either KVM doesn't support/allow it or the underlying TDX module doesn't. > > I think it's an implicit form of hardcode. From the perspective of userspace, I would like to avoid all the special handling. It is consistent with what is done for TD attributes and XFAM. Presumably feature names must be mapped to capability bits and configuration bits, and that information has to be represented somewhere. > >>> >>> In the future, if we have interface from KVM to report the fixed0 and fixed1 bit (on top of the proposal [2]), userspace can drop the hardcoded one it maintains. At that time, KVM can ensure no conflict by removing the bits from fixed0/1 array when allowing them to be configurable. >>> >>> [1] https://lore.kernel.org/qemu-devel/20241105062408.3533704-49-xiaoyao.li@xxxxxxxxx/ >>> [2] https://lore.kernel.org/all/43b26df1-4c27-41ff-a482-e258f872cc31@xxxxxxxxx/ >>> >>>>> >>>>> [1] https://lore.kernel.org/all/ZuM12EFbOXmpHHVQ@xxxxxxxxxx/ >>>>> >>>> >>> >> >