On 1/15/2025 8:17 AM, Mingwei Zhang wrote: > On Wed, Nov 20, 2024, Mi, Dapeng wrote: >> On 11/19/2024 10:30 PM, Sean Christopherson wrote: >>> As per my feedback in the initial RFC[*]: >>> >>> 2. The module param absolutely must not be exposed to userspace until all patches >>> are in place. The easiest way to do that without creating dependency hell is >>> to simply not create the module param. >>> >>> [*] https://lore.kernel.org/all/ZhhQBHQ6V7Zcb8Ve@xxxxxxxxxx >> Sure. It looks we missed this comment. Would address it. >> > Dapeng, just synced with Sean offline. His point is that we still need > kernel parameter but introduce that in the last part of the series so > that any bisect won't trigger the new PMU logic in the middle of the > series. But I think you are right to create a global config and make it > false. Ok, I would add a patch to expose the kernel parameter and put it at the last. Thanks. > >>> On Thu, Aug 01, 2024, Mingwei Zhang wrote: >>>> Introduce enable_passthrough_pmu as a RO KVM kernel module parameter. This >>>> variable is true only when the following conditions satisfies: >>>> - set to true when module loaded. >>>> - enable_pmu is true. >>>> - is running on Intel CPU. >>>> - supports PerfMon v4. >>>> - host PMU supports passthrough mode. >>>> >>>> The value is always read-only because passthrough PMU currently does not >>>> support features like LBR and PEBS, while emualted PMU does. This will end >>>> up with two different values for kvm_cap.supported_perf_cap, which is >>>> initialized at module load time. Maintaining two different perf >>>> capabilities will add complexity. Further, there is not enough motivation >>>> to support running two types of PMU implementations at the same time, >>>> although it is possible/feasible in reality. >>>> >>>> Finally, always propagate enable_passthrough_pmu and perf_capabilities into >>>> kvm->arch for each KVM instance. >>>> >>>> Co-developed-by: Xiong Zhang <xiong.y.zhang@xxxxxxxxxxxxxxx> >>>> Signed-off-by: Xiong Zhang <xiong.y.zhang@xxxxxxxxxxxxxxx> >>>> Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx> >>>> Signed-off-by: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx> >>>> Tested-by: Yongwei Ma <yongwei.ma@xxxxxxxxx> >>>> --- >>>> arch/x86/include/asm/kvm_host.h | 1 + >>>> arch/x86/kvm/pmu.h | 14 ++++++++++++++ >>>> arch/x86/kvm/vmx/vmx.c | 7 +++++-- >>>> arch/x86/kvm/x86.c | 8 ++++++++ >>>> arch/x86/kvm/x86.h | 1 + >>>> 5 files changed, 29 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >>>> index f8ca74e7678f..a15c783f20b9 100644 >>>> --- a/arch/x86/include/asm/kvm_host.h >>>> +++ b/arch/x86/include/asm/kvm_host.h >>>> @@ -1406,6 +1406,7 @@ struct kvm_arch { >>>> >>>> bool bus_lock_detection_enabled; >>>> bool enable_pmu; >>>> + bool enable_passthrough_pmu; >>> Again, as I suggested/requested in the initial RFC[*], drop the per-VM flag as well >>> as kvm_pmu.passthrough. There is zero reason to cache the module param. KVM >>> should always query kvm->arch.enable_pmu prior to checking if the mediated PMU >>> is enabled, so I doubt we even need a helper to check both. >>> >>> [*] https://lore.kernel.org/all/ZhhOEDAl6k-NzOkM@xxxxxxxxxx >> Sure. >> >> >>>> >>>> u32 notify_window; >>>> u32 notify_vmexit_flags; >>>> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h >>>> index 4d52b0b539ba..cf93be5e7359 100644 >>>> --- a/arch/x86/kvm/pmu.h >>>> +++ b/arch/x86/kvm/pmu.h >>>> @@ -208,6 +208,20 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops) >>>> enable_pmu = false; >>>> } >>>> >>>> + /* Pass-through vPMU is only supported in Intel CPUs. */ >>>> + if (!is_intel) >>>> + enable_passthrough_pmu = false; >>>> + >>>> + /* >>>> + * Pass-through vPMU requires at least PerfMon version 4 because the >>>> + * implementation requires the usage of MSR_CORE_PERF_GLOBAL_STATUS_SET >>>> + * for counter emulation as well as PMU context switch. In addition, it >>>> + * requires host PMU support on passthrough mode. Disable pass-through >>>> + * vPMU if any condition fails. >>>> + */ >>>> + if (!enable_pmu || kvm_pmu_cap.version < 4 || !kvm_pmu_cap.passthrough) >>> As is quite obvious by the end of the series, the v4 requirement is specific to >>> Intel. >>> >>> if (!enable_pmu || !kvm_pmu_cap.passthrough || >>> (is_intel && kvm_pmu_cap.version < 4) || >>> (is_amd && kvm_pmu_cap.version < 2)) >>> enable_passthrough_pmu = false; >>> >>> Furthermore, there is zero reason to explicitly and manually check the vendor, >>> kvm_init_pmu_capability() takes kvm_pmu_ops. Adding a callback is somewhat >>> undesirable as it would lead to duplicate code, but we can still provide separation >>> of concerns by adding const variables to kvm_pmu_ops, a la MAX_NR_GP_COUNTERS. >>> >>> E.g. >>> >>> if (enable_pmu) { >>> perf_get_x86_pmu_capability(&kvm_pmu_cap); >>> >>> /* >>> * WARN if perf did NOT disable hardware PMU if the number of >>> * architecturally required GP counters aren't present, i.e. if >>> * there are a non-zero number of counters, but fewer than what >>> * is architecturally required. >>> */ >>> if (!kvm_pmu_cap.num_counters_gp || >>> WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < min_nr_gp_ctrs)) >>> enable_pmu = false; >>> else if (pmu_ops->MIN_PMU_VERSION > kvm_pmu_cap.version) >>> enable_pmu = false; >>> } >>> >>> if (!enable_pmu || !kvm_pmu_cap.passthrough || >>> pmu_ops->MIN_MEDIATED_PMU_VERSION > kvm_pmu_cap.version) >>> enable_mediated_pmu = false; >> Sure. would do. >> >> >>>> + enable_passthrough_pmu = false; >>>> + >>>> if (!enable_pmu) { >>>> memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); >>>> return; >>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >>>> index ad465881b043..2ad122995f11 100644 >>>> --- a/arch/x86/kvm/vmx/vmx.c >>>> +++ b/arch/x86/kvm/vmx/vmx.c >>>> @@ -146,6 +146,8 @@ module_param_named(preemption_timer, enable_preemption_timer, bool, S_IRUGO); >>>> extern bool __read_mostly allow_smaller_maxphyaddr; >>>> module_param(allow_smaller_maxphyaddr, bool, S_IRUGO); >>>> >>>> +module_param(enable_passthrough_pmu, bool, 0444); >>> Hmm, we either need to put this param in kvm.ko, or move enable_pmu to vendor >>> modules (or duplicate it there if we need to for backwards compatibility?). >>> >>> There are advantages to putting params in vendor modules, when it's safe to do so, >>> e.g. it allows toggling the param when (re)loading a vendor module, so I think I'm >>> supportive of having the param live in vendor code. I just don't want to split >>> the two PMU knobs. >> Since enable_passthrough_pmu has already been in vendor modules, we'd >> better duplicate enable_pmu module parameter in vendor modules as the 1st step. >> >> >>>> #define KVM_VM_CR0_ALWAYS_OFF (X86_CR0_NW | X86_CR0_CD) >>>> #define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST X86_CR0_NE >>>> #define KVM_VM_CR0_ALWAYS_ON \ >>>> @@ -7924,7 +7926,8 @@ static __init u64 vmx_get_perf_capabilities(void) >>>> if (boot_cpu_has(X86_FEATURE_PDCM)) >>>> rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap); >>>> >>>> - if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR)) { >>>> + if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR) && >>>> + !enable_passthrough_pmu) { >>>> x86_perf_get_lbr(&vmx_lbr_caps); >>>> >>>> /* >>>> @@ -7938,7 +7941,7 @@ static __init u64 vmx_get_perf_capabilities(void) >>>> perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT; >>>> } >>>> >>>> - if (vmx_pebs_supported()) { >>>> + if (vmx_pebs_supported() && !enable_passthrough_pmu) { >>> Checking enable_mediated_pmu belongs in vmx_pebs_supported(), not in here, >>> otherwise KVM will incorrectly advertise support to userspace: >>> >>> if (vmx_pebs_supported()) { >>> kvm_cpu_cap_check_and_set(X86_FEATURE_DS); >>> kvm_cpu_cap_check_and_set(X86_FEATURE_DTES64); >>> } >> Sure. Thanks for pointing this. >> >> >>>> perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK; >>>> /* >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>>> index f1d589c07068..0c40f551130e 100644 >>>> --- a/arch/x86/kvm/x86.c >>>> +++ b/arch/x86/kvm/x86.c >>>> @@ -187,6 +187,10 @@ bool __read_mostly enable_pmu = true; >>>> EXPORT_SYMBOL_GPL(enable_pmu); >>>> module_param(enable_pmu, bool, 0444); >>>> >>>> +/* Enable/disable mediated passthrough PMU virtualization */ >>>> +bool __read_mostly enable_passthrough_pmu; >>>> +EXPORT_SYMBOL_GPL(enable_passthrough_pmu); >>>> + >>>> bool __read_mostly eager_page_split = true; >>>> module_param(eager_page_split, bool, 0644); >>>> >>>> @@ -6682,6 +6686,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, >>>> mutex_lock(&kvm->lock); >>>> if (!kvm->created_vcpus) { >>>> kvm->arch.enable_pmu = !(cap->args[0] & KVM_PMU_CAP_DISABLE); >>>> + /* Disable passthrough PMU if enable_pmu is false. */ >>>> + if (!kvm->arch.enable_pmu) >>>> + kvm->arch.enable_passthrough_pmu = false; >>> And this code obviously goes away if the per-VM snapshot is removed.