On Fri, Oct 25, 2024, Xin Li wrote: > On 10/24/2024 12:49 AM, Chao Gao wrote: > > On Mon, Sep 30, 2024 at 10:01:04PM -0700, Xin Li (Intel) wrote: > > > From: Xin Li <xin3.li@xxxxxxxxx> > > > > > > Set VMX CPU capabilities before initializing nested instead of after, > > > as it needs to check VMX CPU capabilities to setup the VMX basic MSR > > > for nested. > > > > Which VMX CPU capabilities are needed? after reading patch 25, I still > > don't get that. Heh, I had the same question. I was worried this was fixing a bug. > Sigh, in v2 I had 'if (kvm_cpu_cap_has(X86_FEATURE_FRED))' in > nested_vmx_setup_basic(), which is changed to 'if (cpu_has_vmx_fred())' > in v3. So the reason for the change is gone. But I think logically > the change is still needed; nested setup should be after VMX setup. Hmm, no, I don't think we want to allow nested_vmx_setup_ctls_msrs() to consume any "output" from vmx_set_cpu_caps(). vmx_set_cpu_caps() is called only on the CPU that loads kvm-intel.ko, whereas nested_vmx_setup_ctls_msrs() is called on all CPUs to check for consistency between CPUs. And thinking more about the relevant flows, there's a flaw with kvm_cpu_caps and vendor module reload. KVM zeroes kvm_cpu_caps during init, but not until kvm_set_cpu_caps() is called, i.e. quite some time after KVM has started doing setup. If KVM had a bug where it checked a feature kvm_set_cpu_caps(), the bug could potentially go unnoticed until just the "right" combination of hardware, module params, and/or Kconfig exposed semi-uninitialized data. I'll post the below (assuming it actually works) to guard against that. Ideally, kvm_cpu_cap_get() would WARN if it's used before caps are finalized, but I don't think the extra protection would be worth the increase in code footprint. -- diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 97a90689a9dc..8fd48119bd41 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -817,7 +817,8 @@ do { \ void kvm_set_cpu_caps(void) { - memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps)); + WARN_ON_ONCE(!bitmap_empty((void *)kvm_cpu_caps, + sizeof(kvm_cpu_caps) * BITS_PER_BYTE)); BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) > sizeof(boot_cpu_data.x86_capability)); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f5685f153e08..075a07412893 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9737,6 +9737,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) } memset(&kvm_caps, 0, sizeof(kvm_caps)); + memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps)); x86_emulator_cache = kvm_alloc_emulator_cache(); if (!x86_emulator_cache) {