Re: [RFC PATCH v3 18/58] KVM: x86/pmu: Introduce enable_passthrough_pmu module parameter

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 19 Nov 2024 06:30:23 -0800

As per my feedback in the initial RFC[*]:

 2. The module param absolutely must not be exposed to userspace until all patches
    are in place.  The easiest way to do that without creating dependency hell is
    to simply not create the module param.

[*] https://lore.kernel.org/all/ZhhQBHQ6V7Zcb8Ve@xxxxxxxxxx

On Thu, Aug 01, 2024, Mingwei Zhang wrote:
> Introduce enable_passthrough_pmu as a RO KVM kernel module parameter. This
> variable is true only when the following conditions satisfies:
>  - set to true when module loaded.
>  - enable_pmu is true.
>  - is running on Intel CPU.
>  - supports PerfMon v4.
>  - host PMU supports passthrough mode.
> 
> The value is always read-only because passthrough PMU currently does not
> support features like LBR and PEBS, while emualted PMU does. This will end
> up with two different values for kvm_cap.supported_perf_cap, which is
> initialized at module load time. Maintaining two different perf
> capabilities will add complexity. Further, there is not enough motivation
> to support running two types of PMU implementations at the same time,
> although it is possible/feasible in reality.
> 
> Finally, always propagate enable_passthrough_pmu and perf_capabilities into
> kvm->arch for each KVM instance.
> 
> Co-developed-by: Xiong Zhang <xiong.y.zhang@xxxxxxxxxxxxxxx>
> Signed-off-by: Xiong Zhang <xiong.y.zhang@xxxxxxxxxxxxxxx>
> Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx>
> Signed-off-by: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx>
> Tested-by: Yongwei Ma <yongwei.ma@xxxxxxxxx>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/pmu.h              | 14 ++++++++++++++
>  arch/x86/kvm/vmx/vmx.c          |  7 +++++--
>  arch/x86/kvm/x86.c              |  8 ++++++++
>  arch/x86/kvm/x86.h              |  1 +
>  5 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f8ca74e7678f..a15c783f20b9 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1406,6 +1406,7 @@ struct kvm_arch {
>  
>  	bool bus_lock_detection_enabled;
>  	bool enable_pmu;
> +	bool enable_passthrough_pmu;

Again, as I suggested/requested in the initial RFC[*], drop the per-VM flag as well
as kvm_pmu.passthrough.  There is zero reason to cache the module param.  KVM
should always query kvm->arch.enable_pmu prior to checking if the mediated PMU
is enabled, so I doubt we even need a helper to check both.

[*] https://lore.kernel.org/all/ZhhOEDAl6k-NzOkM@xxxxxxxxxx

>  
>  	u32 notify_window;
>  	u32 notify_vmexit_flags;
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 4d52b0b539ba..cf93be5e7359 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -208,6 +208,20 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
>  			enable_pmu = false;
>  	}
>  
> +	/* Pass-through vPMU is only supported in Intel CPUs. */
> +	if (!is_intel)
> +		enable_passthrough_pmu = false;
> +
> +	/*
> +	 * Pass-through vPMU requires at least PerfMon version 4 because the
> +	 * implementation requires the usage of MSR_CORE_PERF_GLOBAL_STATUS_SET
> +	 * for counter emulation as well as PMU context switch.  In addition, it
> +	 * requires host PMU support on passthrough mode. Disable pass-through
> +	 * vPMU if any condition fails.
> +	 */
> +	if (!enable_pmu || kvm_pmu_cap.version < 4 || !kvm_pmu_cap.passthrough)

As is quite obvious by the end of the series, the v4 requirement is specific to
Intel.

	if (!enable_pmu || !kvm_pmu_cap.passthrough ||
	    (is_intel && kvm_pmu_cap.version < 4) ||
	    (is_amd && kvm_pmu_cap.version < 2))
		enable_passthrough_pmu = false;

Furthermore, there is zero reason to explicitly and manually check the vendor,
kvm_init_pmu_capability() takes kvm_pmu_ops.  Adding a callback is somewhat
undesirable as it would lead to duplicate code, but we can still provide separation
of concerns by adding const variables to kvm_pmu_ops, a la MAX_NR_GP_COUNTERS.

E.g.

	if (enable_pmu) {
		perf_get_x86_pmu_capability(&kvm_pmu_cap);

		/*
		 * WARN if perf did NOT disable hardware PMU if the number of
		 * architecturally required GP counters aren't present, i.e. if
		 * there are a non-zero number of counters, but fewer than what
		 * is architecturally required.
		 */
		if (!kvm_pmu_cap.num_counters_gp ||
		    WARN_ON_ONCE(kvm_pmu_cap.num_counters_gp < min_nr_gp_ctrs))
			enable_pmu = false;
		else if (pmu_ops->MIN_PMU_VERSION > kvm_pmu_cap.version)
			enable_pmu = false;
	}

	if (!enable_pmu || !kvm_pmu_cap.passthrough ||
	    pmu_ops->MIN_MEDIATED_PMU_VERSION > kvm_pmu_cap.version)
		enable_mediated_pmu = false;

> +		enable_passthrough_pmu = false;
> +
>  	if (!enable_pmu) {
>  		memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
>  		return;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ad465881b043..2ad122995f11 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -146,6 +146,8 @@ module_param_named(preemption_timer, enable_preemption_timer, bool, S_IRUGO);
>  extern bool __read_mostly allow_smaller_maxphyaddr;
>  module_param(allow_smaller_maxphyaddr, bool, S_IRUGO);
>  
> +module_param(enable_passthrough_pmu, bool, 0444);

Hmm, we either need to put this param in kvm.ko, or move enable_pmu to vendor
modules (or duplicate it there if we need to for backwards compatibility?).

There are advantages to putting params in vendor modules, when it's safe to do so,
e.g. it allows toggling the param when (re)loading a vendor module, so I think I'm
supportive of having the param live in vendor code.  I just don't want to split
the two PMU knobs.

>  #define KVM_VM_CR0_ALWAYS_OFF (X86_CR0_NW | X86_CR0_CD)
>  #define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST X86_CR0_NE
>  #define KVM_VM_CR0_ALWAYS_ON				\
> @@ -7924,7 +7926,8 @@ static __init u64 vmx_get_perf_capabilities(void)
>  	if (boot_cpu_has(X86_FEATURE_PDCM))
>  		rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
>  
> -	if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR)) {
> +	if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR) &&
> +	    !enable_passthrough_pmu) {
>  		x86_perf_get_lbr(&vmx_lbr_caps);
>  
>  		/*
> @@ -7938,7 +7941,7 @@ static __init u64 vmx_get_perf_capabilities(void)
>  			perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
>  	}
>  
> -	if (vmx_pebs_supported()) {
> +	if (vmx_pebs_supported() && !enable_passthrough_pmu) {

Checking enable_mediated_pmu belongs in vmx_pebs_supported(), not in here,
otherwise KVM will incorrectly advertise support to userspace:

	if (vmx_pebs_supported()) {
		kvm_cpu_cap_check_and_set(X86_FEATURE_DS);
		kvm_cpu_cap_check_and_set(X86_FEATURE_DTES64);
	}

>  		perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK;
>  		/*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f1d589c07068..0c40f551130e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -187,6 +187,10 @@ bool __read_mostly enable_pmu = true;
>  EXPORT_SYMBOL_GPL(enable_pmu);
>  module_param(enable_pmu, bool, 0444);
>  
> +/* Enable/disable mediated passthrough PMU virtualization */
> +bool __read_mostly enable_passthrough_pmu;
> +EXPORT_SYMBOL_GPL(enable_passthrough_pmu);
> +
>  bool __read_mostly eager_page_split = true;
>  module_param(eager_page_split, bool, 0644);
>  
> @@ -6682,6 +6686,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		mutex_lock(&kvm->lock);
>  		if (!kvm->created_vcpus) {
>  			kvm->arch.enable_pmu = !(cap->args[0] & KVM_PMU_CAP_DISABLE);
> +			/* Disable passthrough PMU if enable_pmu is false. */
> +			if (!kvm->arch.enable_pmu)
> +				kvm->arch.enable_passthrough_pmu = false;

And this code obviously goes away if the per-VM snapshot is removed.