On 2024-10-14 1:45 p.m., Peter Zijlstra wrote: > On Mon, Oct 14, 2024 at 12:15:11PM -0400, Liang, Kan wrote: >> >> >> On 2024-10-14 7:59 a.m., Peter Zijlstra wrote: >>> On Mon, Sep 23, 2024 at 08:49:17PM +0200, Mingwei Zhang wrote: >>> >>>> The original implementation is by design having a terrible performance >>>> overhead, ie., every PMU context switch at runtime requires a SRCU >>>> lock pair and pmu list traversal. To reduce the overhead, we put >>>> "passthrough" pmus in the front of the list and quickly exit the pmu >>>> traversal when we just pass the last "passthrough" pmu. >>> >>> What was the expensive bit? The SRCU memory barrier or the list >>> iteration? How long is that list really? >> >> Both. But I don't think there is any performance data. >> >> The length of the list could vary on different platforms. For a modern >> server, there could be hundreds of PMUs from uncore PMUs, CXL PMUs, >> IOMMU PMUs, PMUs of accelerator devices and PMUs from all kinds of >> devices. The number could keep increasing with more and more devices >> supporting the PMU capability. >> >> Two methods were considered. >> - One is to add a global variable to track the "passthrough" pmu. The >> idea assumes that there is only one "passthrough" pmu that requires the >> switch, and the situation will not be changed in the near feature. >> So the SRCU memory barrier and the list iteration can be avoided. >> It's implemented in the patch >> >> - The other one is always put the "passthrough" pmus in the front of the >> list. So the unnecessary list iteration can be avoided. It does nothing >> for the SRCU lock pair. > > PaulMck has patches that introduce srcu_read_lock_lite(), which would > avoid the smp_mb() in most cases. > > https://lkml.kernel.org/r/20241011173931.2050422-6-paulmck@xxxxxxxxxx > > We can also keep a second list, just for the passthrough pmus. A bit > like sched_cb_list. Maybe we can do something like pmu_event_list, which has its own lock. So we should not need the srcu and the module reference. Thanks, Kan