On 2024-04-16 1:34 a.m., Zhang, Xiong Y wrote: > > > On 4/16/2024 12:03 AM, Liang, Kan wrote: >> >> >> On 2024-04-12 4:56 p.m., Liang, Kan wrote: >>>> What if perf had a global knob to enable/disable mediate PMU support? Then when >>>> KVM is loaded with enable_mediated_true, call into perf to (a) check that there >>>> are no existing !exclude_guest events (this part could be optional), and (b) set >>>> the global knob to reject all new !exclude_guest events (for the core PMU?). >>>> >>>> Hmm, or probably better, do it at VM creation. That has the advantage of playing >>>> nice with CONFIG_KVM=y (perf could reject the enabling without completely breaking >>>> KVM), and not causing problems if KVM is auto-probed but the user doesn't actually >>>> want to run VMs. >>> I think it should be doable, and may simplify the perf implementation. >>> (The check in the schedule stage should not be necessary anymore.) >>> >>> With this, something like NMI watchdog should fail the VM creation. The >>> user should either disable the NMI watchdog or use a replacement. >>> >>> Thanks, >>> Kan >>>> E.g. (very roughly) >>>> >>>> int x86_perf_get_mediated_pmu(void) >>>> { >>>> if (refcount_inc_not_zero(...)) >>>> return 0; >>>> >>>> if (<system wide events>) >>>> return -EBUSY; >>>> >>>> <slow path with locking> >>>> } >>>> >>>> void x86_perf_put_mediated_pmu(void) >>>> { >>>> if (!refcount_dec_and_test(...)) >>>> return; >>>> >>>> <slow path with locking> >>>> } >> >> >> I think the locking should include the refcount check and system wide >> event check as well. >> It should be possible that two VMs are created very close. >> The second creation may mistakenly return 0 if there is no lock. >> >> I plan to do something as below (not test yet). >> >> +/* >> + * Currently invoked at VM creation to >> + * - Check whether there are existing !exclude_guest system wide events >> + * of PMU with PERF_PMU_CAP_MEDIATED_VPMU >> + * - Set nr_mediated_pmu to prevent !exclude_guest event creation on >> + * PMUs with PERF_PMU_CAP_MEDIATED_VPMU >> + * >> + * No impact for the PMU without PERF_PMU_CAP_MEDIATED_VPMU. The perf >> + * still owns all the PMU resources. >> + */ >> +int x86_perf_get_mediated_pmu(void) >> +{ >> + int ret = 0; >> + mutex_lock(&perf_mediated_pmu_mutex); >> + if (refcount_inc_not_zero(&nr_mediated_pmu_vms)) >> + goto end; >> + >> + if (atomic_read(&nr_include_guest_events)) { >> + ret = -EBUSY; >> + goto end; >> + } >> + refcount_inc(&nr_mediated_pmu_vms); >> +end: >> + mutex_unlock(&perf_mediated_pmu_mutex); >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(x86_perf_get_mediated_pmu); >> + >> +void x86_perf_put_mediated_pmu(void) >> +{ >> + mutex_lock(&perf_mediated_pmu_mutex); >> + refcount_dec(&nr_mediated_pmu_vms); >> + mutex_unlock(&perf_mediated_pmu_mutex); >> +} >> +EXPORT_SYMBOL_GPL(x86_perf_put_mediated_pmu); >> >> >> Thanks, >> Kan > x86_perf_get_mediated_pmu() is called at vm_create(), x86_perf_put_mediated_pmu() is called at vm_destroy(), then system wide perf events without exclude_guest=1 can not be created during the whole vm life cycle (where nr_mediated_pmu_vms > 0 always), do I understand and use the interface correctly ? Right, but it only impacts the events of PMU with the PERF_PMU_CAP_MEDIATED_VPMU. For other PMUs, the event with exclude_guest=1 can still be created. KVM should not touch the counters of the PMU without PERF_PMU_CAP_MEDIATED_VPMU. BTW: I will also remove the prefix x86, since the functions are in the generic code. Thanks, Kan