On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote: > Sean Christopherson <seanjc@xxxxxxxxxx> writes: > > > On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote: > >> So I reached out to Microsoft and their answer was that for all these new > >> eVMCS fields (including *PerfGlobalCtrl) observing architectural VMX > >> MSRs should be enough. *PerfGlobalCtrl case is special because of Win11 > >> bug (if we expose the feature in VMX feature MSRs but don't set > >> CPUID.0x4000000A.EBX BIT(0) it just doesn't boot). > > > > I.e. TSC_SCALING shouldn't be gated on the flag? If so, then the 2-D array approach > > is overkill since (a) the CPUID flag only controls PERF_GLOBAL_CTRL and (b) we aren't > > expecting any more flags in the future. > > > > Unfortunately, we have to gate the presence of these new features on > something, otherwise VMM has no way to specify which particular eVMCS > "revision" it wants (TL;DR: we will break migration). > > My initial implementation was inventing 'eVMCS revision' concept: > https://lore.kernel.org/kvm/20220629150625.238286-7-vkuznets@xxxxxxxxxx/ > > which is needed if we don't gate all these new fields on CPUID.0x4000000A.EBX BIT(0). > > Going forward, we will still (likely) need something when new fields show up. My comments from that thread still apply. Adding "revisions" or feature flags isn't maintanable, e.g. at best KVM will end up with a ridiculous number of flags. Looking at QEMU, which I strongly suspect is the only VMM that enables KVM_CAP_HYPERV_ENLIGHTENED_VMCS, it does the sane thing of enabling the capability before grabbing the VMX MSRs. So, why not simply apply filtering for host accesses as well? E.g. /* * New Enlightened VMCS fields always lag behind their hardware * counterparts, filter out fields that are not yet defined. */ if (vmx->nested.enlightened_vmcs_enabled) nested_evmcs_filter_control_msr(vcpu, msr_info); and then the eVMCS can end up being: static bool evmcs_has_perf_global_ctrl(struct kvm_vcpu *vcpu) { struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu); /* * PERF_GLOBAL_CTRL is filtered only for guest accesses, and all guest * accesses should be gated on Hyper-V being enabled and initialized. */ if (WARN_ON_ONCE(!hv_vcpu)) return false; return hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_PERF_GLOBAL_CTRL; } static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu, u32 msr_index, bool host_initiated) { u32 unsupported_ctrls; switch (msr_index) { case MSR_IA32_VMX_EXIT_CTLS: case MSR_IA32_VMX_TRUE_EXIT_CTLS: unsupported_ctrls = EVMCS1_UNSUPPORTED_VMEXIT_CTRL; if (!host_initiated && !evmcs_has_perf_global_ctrl(vcpu)) unsupported_ctrls |= VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL; return unsupported_ctrls; case MSR_IA32_VMX_ENTRY_CTLS: case MSR_IA32_VMX_TRUE_ENTRY_CTLS: unsupported_ctrls = EVMCS1_UNSUPPORTED_VMENTRY_CTRL; if (!host_initiated && !evmcs_has_perf_global_ctrl(vcpu)) unsupported_ctrls |= VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; return unsupported_ctrls; case MSR_IA32_VMX_PROCBASED_CTLS2: return EVMCS1_UNSUPPORTED_2NDEXEC; case MSR_IA32_VMX_TRUE_PINBASED_CTLS: case MSR_IA32_VMX_PINBASED_CTLS: return EVMCS1_UNSUPPORTED_PINCTRL; case MSR_IA32_VMX_VMFUNC: return EVMCS1_UNSUPPORTED_VMFUNC; default: KVM_BUG_ON(1, vcpu->kvm); return 0; } } void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { u64 unsupported_ctrls; if (!msr_info->host_initiated && !vcpu->arch.hyperv_enabled) return; unsupported_ctrls = evmcs_get_unsupported_ctls(vcpu, msr_info->index, msr_info->host_initiated); if (msr_info->index == MSR_IA32_VMX_VMFUNC) msr_info->data &= ~unsupported_ctrls; else msr_info->data &= ~(unsupported_ctrls << 32); } static bool nested_evmcs_is_valid_controls(struct kvm_vcpu *vcpu, u32 msr_index, u32 val) { return val & evmcs_get_unsupported_ctls(vcpu, msr_index, false); }