On 4/16/2024 8:48 PM, Liang, Kan wrote: > > > On 2024-04-16 1:34 a.m., Zhang, Xiong Y wrote: >> >> >> On 4/16/2024 12:03 AM, Liang, Kan wrote: >>> >>> >>> On 2024-04-12 4:56 p.m., Liang, Kan wrote: >>>>> What if perf had a global knob to enable/disable mediate PMU support? Then when >>>>> KVM is loaded with enable_mediated_true, call into perf to (a) check that there >>>>> are no existing !exclude_guest events (this part could be optional), and (b) set >>>>> the global knob to reject all new !exclude_guest events (for the core PMU?). >>>>> >>>>> Hmm, or probably better, do it at VM creation. That has the advantage of playing >>>>> nice with CONFIG_KVM=y (perf could reject the enabling without completely breaking >>>>> KVM), and not causing problems if KVM is auto-probed but the user doesn't actually >>>>> want to run VMs. >>>> I think it should be doable, and may simplify the perf implementation. >>>> (The check in the schedule stage should not be necessary anymore.) >>>> >>>> With this, something like NMI watchdog should fail the VM creation. The >>>> user should either disable the NMI watchdog or use a replacement. >>>> >>>> Thanks, >>>> Kan >>>>> E.g. (very roughly) >>>>> >>>>> int x86_perf_get_mediated_pmu(void) >>>>> { >>>>> if (refcount_inc_not_zero(...)) >>>>> return 0; >>>>> >>>>> if (<system wide events>) >>>>> return -EBUSY; >>>>> >>>>> <slow path with locking> >>>>> } >>>>> >>>>> void x86_perf_put_mediated_pmu(void) >>>>> { >>>>> if (!refcount_dec_and_test(...)) >>>>> return; >>>>> >>>>> <slow path with locking> >>>>> } >>> >>> >>> I think the locking should include the refcount check and system wide >>> event check as well. >>> It should be possible that two VMs are created very close. >>> The second creation may mistakenly return 0 if there is no lock. >>> >>> I plan to do something as below (not test yet). >>> >>> +/* >>> + * Currently invoked at VM creation to >>> + * - Check whether there are existing !exclude_guest system wide events >>> + * of PMU with PERF_PMU_CAP_MEDIATED_VPMU >>> + * - Set nr_mediated_pmu to prevent !exclude_guest event creation on >>> + * PMUs with PERF_PMU_CAP_MEDIATED_VPMU >>> + * >>> + * No impact for the PMU without PERF_PMU_CAP_MEDIATED_VPMU. The perf >>> + * still owns all the PMU resources. >>> + */ >>> +int x86_perf_get_mediated_pmu(void) >>> +{ >>> + int ret = 0; >>> + mutex_lock(&perf_mediated_pmu_mutex); >>> + if (refcount_inc_not_zero(&nr_mediated_pmu_vms)) >>> + goto end; >>> + >>> + if (atomic_read(&nr_include_guest_events)) { >>> + ret = -EBUSY; >>> + goto end; >>> + } >>> + refcount_inc(&nr_mediated_pmu_vms); >>> +end: >>> + mutex_unlock(&perf_mediated_pmu_mutex); >>> + return ret; >>> +} >>> +EXPORT_SYMBOL_GPL(x86_perf_get_mediated_pmu); >>> + >>> +void x86_perf_put_mediated_pmu(void) >>> +{ >>> + mutex_lock(&perf_mediated_pmu_mutex); >>> + refcount_dec(&nr_mediated_pmu_vms); >>> + mutex_unlock(&perf_mediated_pmu_mutex); >>> +} >>> +EXPORT_SYMBOL_GPL(x86_perf_put_mediated_pmu); >>> >>> >>> Thanks, >>> Kan >> x86_perf_get_mediated_pmu() is called at vm_create(), x86_perf_put_mediated_pmu() is called at vm_destroy(), then system wide perf events without exclude_guest=1 can not be created during the whole vm life cycle (where nr_mediated_pmu_vms > 0 always), do I understand and use the interface correctly ? > > Right, but it only impacts the events of PMU with the > PERF_PMU_CAP_MEDIATED_VPMU. > For other PMUs, the event with exclude_guest=1 can still be created. > KVM should not touch the counters of the PMU without > PERF_PMU_CAP_MEDIATED_VPMU. > > BTW: I will also remove the prefix x86, since the functions are in the > generic code. > > Thanks, > Kan After userspace VMM call VCPU SET_CPUID() ioctl, KVM knows whether vPMU is enabled or not. If perf_get_mediated_pmu() is called at vm create, it is too early. it is better to let perf_get_mediated_pmu() track per cpu PMU state, so perf_get_mediated_pmu() can be called by kvm after vcpu_cpuid_set(). Note user space vmm may call SET_CPUID() on one vcpu multi times, then here refcount maybe isn't suitable. what's a better solution ? thanks