On 4/16/2024 12:03 AM, Liang, Kan wrote: > > > On 2024-04-12 4:56 p.m., Liang, Kan wrote: >>> What if perf had a global knob to enable/disable mediate PMU support? Then when >>> KVM is loaded with enable_mediated_true, call into perf to (a) check that there >>> are no existing !exclude_guest events (this part could be optional), and (b) set >>> the global knob to reject all new !exclude_guest events (for the core PMU?). >>> >>> Hmm, or probably better, do it at VM creation. That has the advantage of playing >>> nice with CONFIG_KVM=y (perf could reject the enabling without completely breaking >>> KVM), and not causing problems if KVM is auto-probed but the user doesn't actually >>> want to run VMs. >> I think it should be doable, and may simplify the perf implementation. >> (The check in the schedule stage should not be necessary anymore.) >> >> With this, something like NMI watchdog should fail the VM creation. The >> user should either disable the NMI watchdog or use a replacement. >> >> Thanks, >> Kan >>> E.g. (very roughly) >>> >>> int x86_perf_get_mediated_pmu(void) >>> { >>> if (refcount_inc_not_zero(...)) >>> return 0; >>> >>> if (<system wide events>) >>> return -EBUSY; >>> >>> <slow path with locking> >>> } >>> >>> void x86_perf_put_mediated_pmu(void) >>> { >>> if (!refcount_dec_and_test(...)) >>> return; >>> >>> <slow path with locking> >>> } > > > I think the locking should include the refcount check and system wide > event check as well. > It should be possible that two VMs are created very close. > The second creation may mistakenly return 0 if there is no lock. > > I plan to do something as below (not test yet). > > +/* > + * Currently invoked at VM creation to > + * - Check whether there are existing !exclude_guest system wide events > + * of PMU with PERF_PMU_CAP_MEDIATED_VPMU > + * - Set nr_mediated_pmu to prevent !exclude_guest event creation on > + * PMUs with PERF_PMU_CAP_MEDIATED_VPMU > + * > + * No impact for the PMU without PERF_PMU_CAP_MEDIATED_VPMU. The perf > + * still owns all the PMU resources. > + */ > +int x86_perf_get_mediated_pmu(void) > +{ > + int ret = 0; > + mutex_lock(&perf_mediated_pmu_mutex); > + if (refcount_inc_not_zero(&nr_mediated_pmu_vms)) > + goto end; > + > + if (atomic_read(&nr_include_guest_events)) { > + ret = -EBUSY; > + goto end; > + } > + refcount_inc(&nr_mediated_pmu_vms); > +end: > + mutex_unlock(&perf_mediated_pmu_mutex); > + return ret; > +} > +EXPORT_SYMBOL_GPL(x86_perf_get_mediated_pmu); > + > +void x86_perf_put_mediated_pmu(void) > +{ > + mutex_lock(&perf_mediated_pmu_mutex); > + refcount_dec(&nr_mediated_pmu_vms); > + mutex_unlock(&perf_mediated_pmu_mutex); > +} > +EXPORT_SYMBOL_GPL(x86_perf_put_mediated_pmu); > > > Thanks, > Kan x86_perf_get_mediated_pmu() is called at vm_create(), x86_perf_put_mediated_pmu() is called at vm_destroy(), then system wide perf events without exclude_guest=1 can not be created during the whole vm life cycle (where nr_mediated_pmu_vms > 0 always), do I understand and use the interface correctly ? thanks