On Wed, Feb 09, 2022, David Dunn wrote: > KVM_CAP_PMU_DISABLE is used to disable PMU virtualization on individual > x86 VMs. PMU configuration must be done prior to creating VCPUs. Please use imperative mood to state what the patch is doing/adding, as opposed to describing the ABI. E.g. Add a new capability, KVM_CAP_PMU_CAPABILITY, that takes a bitmask of settings/features to allow userspace to configure PMU virtualization on a per-VM basis. For now, support a single flag, KVM_CAP_PMU_DISABLE, to allow disabling PMU virtualization for a VM even when KVM is configured with enable_pmu=true a module level. To keep KVM simple, disallow changing VM's PMU configuration after vCPUs have been created. > To enable future extension, KVM_CAP_PMU_CAPABILITY reports available > settings via bitmask when queried via check_extension. This can be omitted, the motivation is self-explanatory and the pattern is well established. The desire for future expensions can be alluded to by describing KVM_CAP_PMU_DISABLE as the first/initial flag or whatever. > For VMs that have PMU virtualization disabled, usermode will need to > clear CPUID leaf 0xA to notify guests. Eh, I'd just omit this, it won't age well if there's more that userspace "needs" to do, and strictly speaking KVM doesn't care if userspace presents a bogus vCPU model to the guest. > Signed-off-by: David Dunn <daviddunn@xxxxxxxxxx> > --- > Documentation/virt/kvm/api.rst | 22 ++++++++++++++++++++++ > arch/x86/include/asm/kvm_host.h | 1 + > arch/x86/kvm/svm/pmu.c | 2 +- > arch/x86/kvm/vmx/pmu_intel.c | 2 +- > arch/x86/kvm/x86.c | 12 ++++++++++++ > include/uapi/linux/kvm.h | 4 ++++ > tools/include/uapi/linux/kvm.h | 4 ++++ > 7 files changed, 45 insertions(+), 2 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index a4267104db50..df836965b347 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -7561,3 +7561,25 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset > of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace > the hypercalls whose corresponding bit is in the argument, and return > ENOSYS for the others. > + > +8.35 KVM_CAP_PMU_CAPABILITY > +--------------------------- > + > +:Capability KVM_CAP_PMU_CAPABILITY > +:Architectures: x86 > +:Type: vm > +:Parameters: arg[0] is bitmask of PMU virtualization capabilities. > +:Returns 0 on success, -EINVAL when arg[0] contains invalid bits > + > +This capability alters PMU virtualization in KVM. > + > +Calling KVM_CHECK_EXTENSION for this capability returns a bitmask of > +PMU virtualization capabilities that can be adjusted on a VM. > + > +The argument to KVM_ENABLE_CAP is also a bitmask and selects specific > +PMU virtualization capabilities to be applied to the VM. This can > +only be invoked on a VM prior to the creation of VCPUs. > + > +At this time, KVM_CAP_PMU_DISABLE is the only capability. Setting > +this capability will disable PMU virtualization for that VM. Usermode > +should adjust CPUID leaf 0xA to reflect that the PMU is disabled. > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index c371ee7e45f7..f832cd0f9b27 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1232,6 +1232,7 @@ struct kvm_arch { > hpa_t hv_root_tdp; > spinlock_t hv_root_tdp_lock; > #endif > + bool enable_pmu; > }; > > struct kvm_vm_stat { > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > index 5aa45f13b16d..d4de52409335 100644 > --- a/arch/x86/kvm/svm/pmu.c > +++ b/arch/x86/kvm/svm/pmu.c > @@ -101,7 +101,7 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr, > { > struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu); > > - if (!enable_pmu) > + if (!vcpu->kvm->arch.enable_pmu) > return NULL; > > switch (msr) { > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c > index 03fab48b149c..4e5b1eeeb77c 100644 > --- a/arch/x86/kvm/vmx/pmu_intel.c > +++ b/arch/x86/kvm/vmx/pmu_intel.c > @@ -487,7 +487,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu) > pmu->reserved_bits = 0xffffffff00200000ull; > > entry = kvm_find_cpuid_entry(vcpu, 0xa, 0); > - if (!entry || !enable_pmu) > + if (!entry || !vcpu->kvm->arch.enable_pmu) > return; > eax.full = entry->eax; > edx.full = entry->edx; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 6f69f3e3635e..c35a6a193bf4 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4329,6 +4329,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > if (r < sizeof(struct kvm_xsave)) > r = sizeof(struct kvm_xsave); > break; > + case KVM_CAP_PMU_CAPABILITY: > + r = enable_pmu ? KVM_CAP_PMU_MASK : 0; > + break; > } > default: > break; > @@ -6003,6 +6006,14 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > kvm->arch.exit_on_emulation_error = cap->args[0]; > r = 0; > break; > + case KVM_CAP_PMU_CAPABILITY: > + r = -EINVAL; > + if (!enable_pmu || kvm->created_vcpus > 0 || This needs to take kvm->lock when checking kvm->created_vcpus and setting the per-VM enable_pmu. And preferred kernel style is to omit the "> 0" when checking for a non-zero value. Despite being an int, created_vcpus can't go negative. > + cap->args[0] & ~KVM_CAP_PMU_MASK) > + break; > + kvm->arch.enable_pmu = !(cap->args[0] & KVM_CAP_PMU_DISABLE); > + r = 0; > + break; > default: > r = -EINVAL; > break; > @@ -11630,6 +11641,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); > > kvm->arch.guest_can_read_msr_platform_info = true; > + kvm->arch.enable_pmu = enable_pmu; > > #if IS_ENABLED(CONFIG_HYPERV) > spin_lock_init(&kvm->arch.hv_root_tdp_lock); > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 5191b57e1562..cf6774cc18ef 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -1134,6 +1134,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_VM_GPA_BITS 207 > #define KVM_CAP_XSAVE2 208 > #define KVM_CAP_SYS_ATTRIBUTES 209 > +#define KVM_CAP_PMU_CAPABILITY 210 > > #ifdef KVM_CAP_IRQ_ROUTING > > @@ -1970,6 +1971,9 @@ struct kvm_dirty_gfn { > #define KVM_BUS_LOCK_DETECTION_OFF (1 << 0) > #define KVM_BUS_LOCK_DETECTION_EXIT (1 << 1) > > +#define KVM_CAP_PMU_DISABLE (1 << 0) > +#define KVM_CAP_PMU_MASK (KVM_CAP_PMU_DISABLE) KVM_CAP_PMU_MASK shouldn't be in the uapi header, expanding it in the future could theoretically break userspace