On Mon, Sep 23, 2024 at 08:49:17PM +0200, Mingwei Zhang wrote: > The original implementation is by design having a terrible performance > overhead, ie., every PMU context switch at runtime requires a SRCU > lock pair and pmu list traversal. To reduce the overhead, we put > "passthrough" pmus in the front of the list and quickly exit the pmu > traversal when we just pass the last "passthrough" pmu. What was the expensive bit? The SRCU memory barrier or the list iteration? How long is that list really?