Re: [RFC PATCH v3 14/58] perf: Add switch_interrupt() interface

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Mon, 14 Oct 2024 19:45:56 +0200

On Mon, Oct 14, 2024 at 12:15:11PM -0400, Liang, Kan wrote:
> 
> 
> On 2024-10-14 7:59 a.m., Peter Zijlstra wrote:
> > On Mon, Sep 23, 2024 at 08:49:17PM +0200, Mingwei Zhang wrote:
> > 
> >> The original implementation is by design having a terrible performance
> >> overhead, ie., every PMU context switch at runtime requires a SRCU
> >> lock pair and pmu list traversal. To reduce the overhead, we put
> >> "passthrough" pmus in the front of the list and quickly exit the pmu
> >> traversal when we just pass the last "passthrough" pmu.
> > 
> > What was the expensive bit? The SRCU memory barrier or the list
> > iteration? How long is that list really?
> 
> Both. But I don't think there is any performance data.
> 
> The length of the list could vary on different platforms. For a modern
> server, there could be hundreds of PMUs from uncore PMUs, CXL PMUs,
> IOMMU PMUs, PMUs of accelerator devices and PMUs from all kinds of
> devices. The number could keep increasing with more and more devices
> supporting the PMU capability.
> 
> Two methods were considered.
> - One is to add a global variable to track the "passthrough" pmu. The
> idea assumes that there is only one "passthrough" pmu that requires the
> switch, and the situation will not be changed in the near feature.
> So the SRCU memory barrier and the list iteration can be avoided.
> It's implemented in the patch
> 
> - The other one is always put the "passthrough" pmus in the front of the
> list. So the unnecessary list iteration can be avoided. It does nothing
> for the SRCU lock pair.

PaulMck has patches that introduce srcu_read_lock_lite(), which would
avoid the smp_mb() in most cases.

  https://lkml.kernel.org/r/20241011173931.2050422-6-paulmck@xxxxxxxxxx

We can also keep a second list, just for the passthrough pmus. A bit
like sched_cb_list.