On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote: > Sean Christopherson <seanjc@xxxxxxxxxx> writes: > > > On Thu, Aug 18, 2022, Vitaly Kuznetsov wrote: > >> Sean Christopherson <seanjc@xxxxxxxxxx> writes: > >> > >> > On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote: > >> >> + * Note: HV_X64_NESTED_EVMCS1_2022_UPDATE is not currently documented in any > >> >> + * published TLFS version. When the bit is set, nested hypervisor can use > >> >> + * 'updated' eVMCSv1 specification (perf_global_ctrl, s_cet, ssp, lbr_ctl, > >> >> + * encls_exiting_bitmap, tsc_multiplier fields which were missing in 2016 > >> >> + * specification). > >> >> + */ > >> >> +#define HV_X64_NESTED_EVMCS1_2022_UPDATE BIT(0) > >> > > >> > This bit is now defined[*], but the docs says it's only for perf_global_ctrl. Are > >> > we expecting an update to the TLFS? > >> > > >> > Indicates support for the GuestPerfGlobalCtrl and HostPerfGlobalCtrl fields > >> > in the enlightened VMCS. > >> > > >> > [*] https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/feature-discovery#hypervisor-nested-virtualization-features---0x4000000a > >> > > >> > >> Oh well, better this than nothing. I'll ping the people who told me > >> about this bit that their description is incomplete. > > > > Not that it changes anything, but I'd rather have no documentation. I'd much rather > > KVM say "this is the undocumented behavior" than "the document behavior is wrong". > > > > So I reached out to Microsoft and their answer was that for all these new > eVMCS fields (including *PerfGlobalCtrl) observing architectural VMX > MSRs should be enough. *PerfGlobalCtrl case is special because of Win11 > bug (if we expose the feature in VMX feature MSRs but don't set > CPUID.0x4000000A.EBX BIT(0) it just doesn't boot). I.e. TSC_SCALING shouldn't be gated on the flag? If so, then the 2-D array approach is overkill since (a) the CPUID flag only controls PERF_GLOBAL_CTRL and (b) we aren't expecting any more flags in the future. What about this for an implementation? static bool evmcs_has_perf_global_ctrl(struct kvm_vcpu *vcpu) { struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu); /* * Filtering VMX controls for eVMCS compatibility should only be done * for guest accesses, and all such accesses should be gated on Hyper-V * being enabled and initialized. */ if (WARN_ON_ONCE(!hv_vcpu)) return false; return hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_PERF_GLOBAL_CTRL; } static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu, u32 msr_index) { u32 unsupported_ctrls; switch (msr_index) { case MSR_IA32_VMX_EXIT_CTLS: case MSR_IA32_VMX_TRUE_EXIT_CTLS: unsupported_ctrls = EVMCS1_UNSUPPORTED_VMEXIT_CTRL; if (!evmcs_has_perf_global_ctrl(vcpu)) unsupported_ctrls |= VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL; return unsupported_ctrls; case MSR_IA32_VMX_ENTRY_CTLS: case MSR_IA32_VMX_TRUE_ENTRY_CTLS: unsupported_ctrls = EVMCS1_UNSUPPORTED_VMENTRY_CTRL; if (!evmcs_has_perf_global_ctrl(vcpu)) unsupported_ctrls |= VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL; return unsupported_ctrls; case MSR_IA32_VMX_PROCBASED_CTLS2: return EVMCS1_UNSUPPORTED_2NDEXEC; case MSR_IA32_VMX_TRUE_PINBASED_CTLS: case MSR_IA32_VMX_PINBASED_CTLS: return EVMCS1_UNSUPPORTED_PINCTRL; case MSR_IA32_VMX_VMFUNC: return EVMCS1_UNSUPPORTED_VMFUNC; default: KVM_BUG_ON(1, vcpu->kvm); return 0; } } void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) { u64 unsupported_ctrls = evmcs_get_unsupported_ctls(vcpu, msr_index); if (msr_index == MSR_IA32_VMX_VMFUNC) *pdata &= ~unsupported_ctrls; else *pdata &= ~(unsupported_ctrls << 32); } > What I'm still concerned about is future proofing KVM for new > features. When something is getting added to KVM for which no eVMCS > field is currently defined, both Hyper-V-on-KVM and KVM-on-Hyper-V cases > should be taken care of. It would probably be better to reverse our > filtering, explicitly listing features supported in eVMCS. The lists are > going to be fairly long but at least we won't have to take care of any > new architectural feature added to KVM. Having the filtering be opt-in crossed my mind as well. Reversing the filtering can be done after this series though, correct?