Re: [PATCH v3 5/7] MIPS: perf: Allocate per-core counters on demand

James Hogan <jhogan@xxxxxxxxxx> · Wed, 16 May 2018 19:05:20 +0100

On Fri, Apr 20, 2018 at 11:23:07AM +0100, Matt Redfearn wrote:
> Previously when performance counters are per-core, rather than
> per-thread, the number available were divided by 2 on detection, and the
> counters used by each thread in a core were "swizzled" to ensure
> separation. However, this solution is suboptimal since it relies on a
> couple of assumptions:
> a) Always having 2 VPEs / core (number of counters was divided by 2)
> b) Always having a number of counters implemented in the core that is
>    divisible by 2. For instance if an SoC implementation had a single
>    counter and 2 VPEs per core, then this logic would fail and no
>    performance counters would be available.
> The mechanism also does not allow for one VPE in a core using more than
> it's allocation of the per-core counters to count multiple events even
> though other VPEs may not be using them.
> 
> Fix this situation by instead allocating (and releasing) per-core
> performance counters when they are requested. This approach removes the
> above assumptions and fixes the shortcomings.
> 
> In order to do this:
> Add additional logic to mipsxx_pmu_alloc_counter() to detect if a
> sibling is using a per-core counter, and to allocate a per-core counter
> in all sibling CPUs.
> Similarly, add a mipsxx_pmu_free_counter() function to release a
> per-core counter in all sibling CPUs when it is finished with.
> A new spinlock, core_counters_lock, is introduced to ensure exclusivity
> when allocating and releasing per-core counters.
> Since counters are now allocated per-core on demand, rather than being
> reserved per-thread at boot, all of the "swizzling" of counters is
> removed.
> 
> The upshot is that in an SoC with 2 counters / thread, counters are
> reported as:
> Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
> available to each CPU, irq 18
> 
> Running an instance of a test program on each of 2 threads in a
> core, both threads can use their 2 counters to count 2 events:
> 
> taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
> perf stat -e instructions:u,branches:u ./test_prog
> 
>  Performance counter stats for './test_prog':
> 
>              30002      instructions:u
>              10000      branches:u
> 
>        0.005164264 seconds time elapsed
>  Performance counter stats for './test_prog':
> 
>              30002      instructions:u
>              10000      branches:u
> 
>        0.006139975 seconds time elapsed
> 
> In an SoC with 2 counters / core (which can be forced by setting
> cpu_has_mipsmt_pertccounters = 0), counters are reported as:
> Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
> available to each core, irq 18
> 
> Running an instance of a test program on each of 2 threads in a
> core, now only one thread manages to secure the performance counters to
> count 2 events. The other thread does not get any counters.
> 
> taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
> perf stat -e instructions:u,branches:u ./test_prog
> 
>  Performance counter stats for './test_prog':
> 
>      <not counted>       instructions:u
>      <not counted>       branches:u
> 
>        0.005179533 seconds time elapsed
> 
>  Performance counter stats for './test_prog':
> 
>              30002      instructions:u
>              10000      branches:u
> 
>        0.005179467 seconds time elapsed
> 
> Signed-off-by: Matt Redfearn <matt.redfearn@xxxxxxxx>

While this sounds like an improvement in practice, being able to use
more counters on single threaded stuff than otherwise, I'm a little
concerned what would happen if a task was migrated to a different CPU
and the perf counters couldn't be obtained on the new CPU due to
counters already being in use. Would the values be incorrectly small?

Cheers
James
Attachment:
signature.asc

Description: PGP signature