On Wed, Apr 17, 2024, Xiong Y Zhang wrote: > On 4/16/2024 8:48 PM, Liang, Kan wrote: > >> x86_perf_get_mediated_pmu() is called at vm_create(), > >> x86_perf_put_mediated_pmu() is called at vm_destroy(), then system wide > >> perf events without exclude_guest=1 can not be created during the whole vm > >> life cycle (where nr_mediated_pmu_vms > 0 always), do I understand and use > >> the interface correctly ? > > > > Right, but it only impacts the events of PMU with the > > PERF_PMU_CAP_MEDIATED_VPMU. For other PMUs, the event with exclude_guest=1 > > can still be created. KVM should not touch the counters of the PMU without > > PERF_PMU_CAP_MEDIATED_VPMU. > > > > BTW: I will also remove the prefix x86, since the functions are in the > > generic code. > > > > Thanks, > > Kan > After userspace VMM call VCPU SET_CPUID() ioctl, KVM knows whether vPMU is > enabled or not. If perf_get_mediated_pmu() is called at vm create, it is too > early. Eh, if someone wants to create _only_ VMs without vPMUs, then they should load KVM with enable_pmu=false. I can see people complaining about not being able to create VMs if they don't want to use have *any* vPMU usage, but I doubt anyone has a use cases where they want a mix of PMU-enabled and PMU- disabled VMs, *and* they are ok with VM creation failing for some VMs but not others. > it is better to let perf_get_mediated_pmu() track per cpu PMU state, > so perf_get_mediated_pmu() can be called by kvm after vcpu_cpuid_set(). Note > user space vmm may call SET_CPUID() on one vcpu multi times, then here > refcount maybe isn't suitable. Yeah, waiting until KVM_SET_CPUID2 would be unpleasant for both KVM and userspace. E.g. failing KVM_SET_CPUID2 because KVM can't enable mediated PMU support would be rather confusing for userspace. > what's a better solution ? If doing the checks at VM creation is a stick point for folks, then the best approach is probably to use KVM_CAP_PMU_CAPABILITY, i.e. require userspace to explicitly opt-in to enabling mediated PMU usage. Ha! We can even do that without additional uAPI, because KVM interprets cap->args[0]==0 as "enable vPMU". The big problem with this is that enabling mediated PMU support by default would break userspace. Hmm, but that's arguably the case no matter what, as a setup that worked before would suddenly start failing if the host was configured to use the PMU-based NMI watchdog. E.g. this, if we're ok commiting to never enabling mediated PMU by default. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 47d9f03b7778..01d9ee2114c8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6664,9 +6664,21 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, break; mutex_lock(&kvm->lock); - if (!kvm->created_vcpus) { - kvm->arch.enable_pmu = !(cap->args[0] & KVM_PMU_CAP_DISABLE); - r = 0; + /* + * To keep PMU configuration "simple", setting vPMU support is + * disallowed if vCPUs are created, or if mediated PMU support + * was already enabled for the VM. + */ + if (!kvm->created_vcpus && + (!enable_mediated_pmu || !kvm->arch.enable_pmu)) { + if (enable_mediated_pmu && + !(cap->args[0] & KVM_PMU_CAP_DISABLE)) + r = x86_perf_get_mediated_pmu(); + else + r = 0; + + if (!r) + kvm->arch.enable_pmu = !(cap->args[0] & KVM_PMU_CAP_DISABLE); } mutex_unlock(&kvm->lock); break; @@ -12563,7 +12575,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz; kvm->arch.guest_can_read_msr_platform_info = true; - kvm->arch.enable_pmu = enable_pmu; + + /* PMU virtualization is opt-in when mediated PMU support is enabled. */ + kvm->arch.enable_pmu = enable_pmu && !enable_mediated_pmu; #if IS_ENABLED(CONFIG_HYPERV) spin_lock_init(&kvm->arch.hv_root_tdp_lock);