On Fri, Apr 20, 2018 at 11:23:07AM +0100, Matt Redfearn wrote: > Previously when performance counters are per-core, rather than > per-thread, the number available were divided by 2 on detection, and the > counters used by each thread in a core were "swizzled" to ensure > separation. However, this solution is suboptimal since it relies on a > couple of assumptions: > a) Always having 2 VPEs / core (number of counters was divided by 2) > b) Always having a number of counters implemented in the core that is > divisible by 2. For instance if an SoC implementation had a single > counter and 2 VPEs per core, then this logic would fail and no > performance counters would be available. > The mechanism also does not allow for one VPE in a core using more than > it's allocation of the per-core counters to count multiple events even > though other VPEs may not be using them. > > Fix this situation by instead allocating (and releasing) per-core > performance counters when they are requested. This approach removes the > above assumptions and fixes the shortcomings. > > In order to do this: > Add additional logic to mipsxx_pmu_alloc_counter() to detect if a > sibling is using a per-core counter, and to allocate a per-core counter > in all sibling CPUs. > Similarly, add a mipsxx_pmu_free_counter() function to release a > per-core counter in all sibling CPUs when it is finished with. > A new spinlock, core_counters_lock, is introduced to ensure exclusivity > when allocating and releasing per-core counters. > Since counters are now allocated per-core on demand, rather than being > reserved per-thread at boot, all of the "swizzling" of counters is > removed. > > The upshot is that in an SoC with 2 counters / thread, counters are > reported as: > Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters > available to each CPU, irq 18 > > Running an instance of a test program on each of 2 threads in a > core, both threads can use their 2 counters to count 2 events: > > taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8 > perf stat -e instructions:u,branches:u ./test_prog > > Performance counter stats for './test_prog': > > 30002 instructions:u > 10000 branches:u > > 0.005164264 seconds time elapsed > Performance counter stats for './test_prog': > > 30002 instructions:u > 10000 branches:u > > 0.006139975 seconds time elapsed > > In an SoC with 2 counters / core (which can be forced by setting > cpu_has_mipsmt_pertccounters = 0), counters are reported as: > Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters > available to each core, irq 18 > > Running an instance of a test program on each of 2 threads in a > core, now only one thread manages to secure the performance counters to > count 2 events. The other thread does not get any counters. > > taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8 > perf stat -e instructions:u,branches:u ./test_prog > > Performance counter stats for './test_prog': > > <not counted> instructions:u > <not counted> branches:u > > 0.005179533 seconds time elapsed > > Performance counter stats for './test_prog': > > 30002 instructions:u > 10000 branches:u > > 0.005179467 seconds time elapsed > > Signed-off-by: Matt Redfearn <matt.redfearn@xxxxxxxx> While this sounds like an improvement in practice, being able to use more counters on single threaded stuff than otherwise, I'm a little concerned what would happen if a task was migrated to a different CPU and the perf counters couldn't be obtained on the new CPU due to counters already being in use. Would the values be incorrectly small? Cheers James
Attachment:
signature.asc
Description: PGP signature