On Wed, Jul 21, 2021, Hu, Robert wrote: > > > Queued, thanks. Without having checked the kvm-unit-tests sources > > > very thoroughly, this might be a configuration issue in > > > kvm-unit-tests; in theory "-cpu host" (unlike "-cpu > > > host,migratable=no") should not enable TSC scaling. > > > > As noted in the code comments, KVM allows VMREAD/VMWRITE to all defined > > fields, whether or not the field should actually exist for the vCPU model doesn't > > enter into the equation. That's technically wrong as there are a number of > > fields that the SDM explicitly states exist iff a certain feature is supported. > > It's right that a number of fields' existence depends on some certain feature. > Meanwhile, "IA32_VMX_VMCS_ENUM indicates to software the highest index > value used in the encoding of any field *supported* by the processor", rather than > *existed*. Yes. > So my understanding is no matter what VMCS exec control field's value is set, > value of IA32_VMX_VMCS_ENUM shall not be affected, as it reports the physical > CPU's capability rather than runtime VMCS configuration. Yes. > Back to nested case, L1's VMCS configuration lays the "physical" capability > for L2, right? Yes. > nested_vmx_msrs or at least nested_vmx_msrs.vmcs_enum shall be put to vcpu > scope, rather than current kvm global. > > Current nested_vmx_calc_vmcs_enum_msr() is invoked at early stage, before > vcpu features are settled. I think should be moved to later stage as well. Just moving the helper (or adding another call) wouldn't fix the underlying problem that KVM doesn't correctly model the interplay between VMX features and VMCS fields. It's arguably less wrong than letting userspace stuff an incorrect value, but it's not 100% correct and ignoring/overriding userspace is sketchy at best. As suggested below, the full fix is to fail VMREAD/VMWRITE to fields that shouldn't exist given the vCPU model. > > To fix that we'd need to add a "feature flag" to vmcs_field_to_offset_table > > that is checked against the vCPU model, though updating the MSR would > > probably fall onto userspace's shoulders? > > > > And FWIW, this is the QEMU code: > > > > #define VMCS12_MAX_FIELD_INDEX (0x17) > > > > static void kvm_msr_entry_add_vmx(X86CPU *cpu, FeatureWordArray f) > > { > > ... > > > > /* > > * Just to be safe, write these with constant values. The CRn_FIXED1 > > * MSRs are generated by KVM based on the vCPU's CPUID. > > */ > > kvm_msr_entry_add(cpu, MSR_IA32_VMX_CR0_FIXED0, > > CR0_PE_MASK | CR0_PG_MASK | CR0_NE_MASK); > > kvm_msr_entry_add(cpu, MSR_IA32_VMX_CR4_FIXED0, > > CR4_VMXE_MASK); > > kvm_msr_entry_add(cpu, MSR_IA32_VMX_VMCS_ENUM, > > VMCS12_MAX_FIELD_INDEX << 1); > > } >