Enable support for IA32_APERF/IA32_MPERF performance monitoring in KVM guests. These MSRs allow guests to measure their effective CPU frequency by comparing actual CPU cycles (APERF) against reference cycles (MPERF). Only expose X86_FEATURE_APERFMPERF to guests when the host has both CONSTANT_TSC and NONSTOP_TSC. These features ensure the TSC frequency remains stable across C-states and P-states, which is necessary for "background" MPERF accounting. Guest TSC scaling via KVM_SET_TSC_KHZ is not supported: - On Intel, IA32_MPERF ticks at host rate regardless of guest TSC scaling, making passthrough impossible without intercepting reads - On AMD, guest TSC scaling does affect IA32_MPERF reads, but handling it would significantly complicate cycle accounting Record host support in kvm_cpu_caps[], advertise the feature to userspace via CPUID.06H:ECX, and enable the governed feature when supported by both host and guest CPUID. Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx> Co-developed-by: Jim Mattson <jmattson@xxxxxxxxxx> Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx> --- arch/x86/kvm/cpuid.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 41786b834b163..309fa7fef6b7b 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -399,6 +399,10 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent)); + if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && + boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) + kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_APERFMPERF); + /* Invoke the vendor callback only after the above state is updated. */ kvm_x86_call(vcpu_after_set_cpuid)(vcpu); @@ -697,6 +701,12 @@ void kvm_set_cpu_caps(void) if (boot_cpu_has(X86_FEATURE_AMD_SSBD)) kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD); + if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && + boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) + kvm_cpu_cap_init_kvm_defined(CPUID_6_ECX, F(APERFMPERF)); + else + kvm_cpu_cap_init_kvm_defined(CPUID_6_ECX, 0); + kvm_cpu_cap_mask(CPUID_7_1_EAX, F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) | F(FZRM) | F(FSRS) | F(FSRC) | @@ -993,7 +1003,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) case 6: /* Thermal management */ entry->eax = 0x4; /* allow ARAT */ entry->ebx = 0; - entry->ecx = 0; + cpuid_entry_override(entry, CPUID_6_ECX); entry->edx = 0; break; /* function 7 has additional index. */ -- 2.47.0.371.ga323438b13-goog