The SoC has PMU support in its L3 cache controller (L3C) and in the DDR4 Memory Controller (DMC). Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@xxxxxxxxxx> --- Documentation/perf/thunderx2-pmu.txt | 106 +++++++++++++++++++++++++++ 1 file changed, 106 insertions(+) create mode 100644 Documentation/perf/thunderx2-pmu.txt diff --git a/Documentation/perf/thunderx2-pmu.txt b/Documentation/perf/thunderx2-pmu.txt new file mode 100644 index 000000000000..9f5dd7459e68 --- /dev/null +++ b/Documentation/perf/thunderx2-pmu.txt @@ -0,0 +1,106 @@ + +Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE) +========================================================================== + +ThunderX2 SoC PMU consists of independent system wide per Socket PMUs such +as Level 3 Cache(L3C) and DDR4 Memory Controller(DMC). + +DMC has 8 interleave channels and L3C has 16 interleave tiles. Events are +sampled for default channel(i.e channel 0) and prorated to total number of +channels/tiles. + +DMC and L3C, Each PMU supports up to 4 counters. Counters are independently +programmable and can be started and stopped individually. Each counter can +be set to sample specific perf events. Counters are 32 bit and do not support +overflow interrupt; they are sampled at every 2 seconds. + +PMU UNCORE (perf) driver: + +The thunderx2-pmu driver registers several perf PMUs for DMC and L3C devices. +Each of the PMUs provides description of its available events +and configuration options in sysfs. + see /sys/devices/uncore_<l3c_S/dmc_S/> + +S is socket id. +Each PMU can be used to sample up to 4 events simultaneously. + +The "format" directory describes format of the config (event ID). +The "events" directory provides configuration templates for all +supported event types that can be used with perf tool. + +For example, "uncore_dmc_0/cnt_cycles/" is an +equivalent of "uncore_dmc_0/config=0x1/". + +Each perf driver also provides a "cpumask" sysfs attribute, which contains a +single CPU ID of the processor which is likely to be used to handle all the +PMU events. It will be the first online CPU from the NUMA node of the PMU device. + +Example for perf tool use: + +perf stat -a -e uncore_dmc_0/cnt_cycles/ sleep 1 + +perf stat -a -e \ +uncore_dmc_0/cnt_cycles/,\ +uncore_dmc_0/data_transfers/,\ +uncore_dmc_0/read_txns/,\ +uncore_dmc_0/write_txns/ sleep 1 + +perf stat -a -e \ +uncore_l3c_0/read_request/,\ +uncore_l3c_0/read_hit/,\ +uncore_l3c_0/inv_request/,\ +uncore_l3c_0/inv_hit/ sleep 1 + +The driver does not support sampling, therefore "perf record" will +not work. Per-task (without "-a") perf sessions are not supported. + +L3C events: +============ + +read_request: + Number of Read requests received by the L3 Cache. + This include Read as well as Read Exclusives. + +read_hit: + Number of Read requests received by the L3 cache that were hit + in the L3 (Data provided form the L3) + +writeback_request: + Number of Write Backs received by the L3 Cache. These are basically + the L2 Evicts and writes from the PCIe Write Cache. + +inv_nwrite_request: + This is the Number of Invalidate and Write received by the L3 Cache. + Also Writes from IO that did not go through the PCIe Write Cache. + +inv_nwrite_hit + This is the Number of Invalidate and Write received by the L3 Cache + That were a hit in the L3 Cache. + +inv_request: + Number of Invalidate request received by the L3 Cache. + +inv_hit: + Number of Invalidate request received by the L3 Cache that were a + hit in L3. + +evict_request: + Number of Evicts that the L3 generated. + +NOTE: +1. Granularity of all these events counter value is cache line length(64 Bytes). +2. L3C cache Hit Ratio = (read_hit + inv_nwrite_hit + inv_hit) / (read_request + inv_nwrite_request + inv_request) + +DMC events: +============ +cnt_cycles: + Count cycles (Clocks at the DMC clock rate) + +write_txns: + Number of 64 Bytes write transactions received by the DMC(s) + +read_txns: + Number of 64 Bytes Read transactions received by the DMC(s) + +data_transfers: + Number of 64 Bytes data transferred to or from DRAM. -- 2.18.0