Re: [RFC PATCH v3 14/58] perf: Add switch_interrupt() interface

"Liang, Kan" <kan.liang@xxxxxxxxxxxxxxx> · Tue, 15 Oct 2024 11:59:54 -0400

On 2024-10-14 1:45 p.m., Peter Zijlstra wrote:
> On Mon, Oct 14, 2024 at 12:15:11PM -0400, Liang, Kan wrote:
>>
>>
>> On 2024-10-14 7:59 a.m., Peter Zijlstra wrote:
>>> On Mon, Sep 23, 2024 at 08:49:17PM +0200, Mingwei Zhang wrote:
>>>
>>>> The original implementation is by design having a terrible performance
>>>> overhead, ie., every PMU context switch at runtime requires a SRCU
>>>> lock pair and pmu list traversal. To reduce the overhead, we put
>>>> "passthrough" pmus in the front of the list and quickly exit the pmu
>>>> traversal when we just pass the last "passthrough" pmu.
>>>
>>> What was the expensive bit? The SRCU memory barrier or the list
>>> iteration? How long is that list really?
>>
>> Both. But I don't think there is any performance data.
>>
>> The length of the list could vary on different platforms. For a modern
>> server, there could be hundreds of PMUs from uncore PMUs, CXL PMUs,
>> IOMMU PMUs, PMUs of accelerator devices and PMUs from all kinds of
>> devices. The number could keep increasing with more and more devices
>> supporting the PMU capability.
>>
>> Two methods were considered.
>> - One is to add a global variable to track the "passthrough" pmu. The
>> idea assumes that there is only one "passthrough" pmu that requires the
>> switch, and the situation will not be changed in the near feature.
>> So the SRCU memory barrier and the list iteration can be avoided.
>> It's implemented in the patch
>>
>> - The other one is always put the "passthrough" pmus in the front of the
>> list. So the unnecessary list iteration can be avoided. It does nothing
>> for the SRCU lock pair.
> 
> PaulMck has patches that introduce srcu_read_lock_lite(), which would
> avoid the smp_mb() in most cases.
> 
>   https://lkml.kernel.org/r/20241011173931.2050422-6-paulmck@xxxxxxxxxx
> 
> We can also keep a second list, just for the passthrough pmus. A bit
> like sched_cb_list.

Maybe we can do something like pmu_event_list, which has its own lock.
So we should not need the srcu and the module reference.

Thanks,
Kan