On Wed, Dec 18, 2024 at 2:07 AM James Clark <james.clark@xxxxxxxxxx> wrote: > > On 18/12/2024 12:54 am, Ian Rogers wrote: > > On Tue, Dec 17, 2024 at 3:56 AM James Clark <james.clark@xxxxxxxxxx> wrote: > >> > >> Document the flag, hint what it's used for and give an example with > >> other useful options to get minimal output. > >> > >> Signed-off-by: James Clark <james.clark@xxxxxxxxxx> > >> --- > >> tools/perf/Documentation/perf-arm-spe.txt | 11 +++++++++++ > >> 1 file changed, 11 insertions(+) > >> > >> diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt > >> index de2b0b479249..588eead438bc 100644 > >> --- a/tools/perf/Documentation/perf-arm-spe.txt > >> +++ b/tools/perf/Documentation/perf-arm-spe.txt > >> @@ -150,6 +150,7 @@ arm_spe/load_filter=1,min_latency=10/' > >> pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege > >> store_filter=1 - collect stores only (PMSFCR.ST) > >> ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS) > >> + discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD) > >> > >> +++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather > >> than only the execution latency. > >> @@ -220,6 +221,16 @@ Common errors > >> > >> Increase sampling interval (see above) > >> > >> +Discard mode > >> +~~~~~~~~~~~~ > >> + > >> +SPE PMU events can be used without the overhead of collecting sample data if > >> +discard mode is supported (optional from Armv8.6). First run a system wide SPE > >> +session (or on the core of interest) using options to minimize output. Then run > >> +perf stat: > >> + > >> + perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null & > >> + perf stat -e SAMPLE_FEED_LD > > > > Perhaps clarify this should be an ARM SPE event? It seems strange to > > have one perf command affect a later one, the purpose of things like > > event multiplexing is to hide the hardware limits. I'd prefer if the > > last bit was like: > > ``` > > Then run perf stat with an SPE event on the same PMU: > > > > perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null & > > perf stat -e arm_spe/SAMPLE_FEED_LD/ > > `` > > > > Thanks, > > Ian > > Hi Ian, > > Confusingly this isn't an SPE event, it is a normal PMU event. The fact > that one Perf command affects the other is because these events only > count when SPE is enabled. When it's enabled it has an effect on a > per-core level which is why in the example I made it simpler by enabling > SPE system wide. > > SPE is an exclusive PMU like Coresight and some others so it can't be > affected by multiplexing or anything like that. The SAMPLE_FEED_LD PMU > would be, but as long as SPE stays enabled it will count the right thing > regardless of multiplexing. Thanks James, sorry for my SPE ignorance. I'm smiling about the use of the word exclusive. When I was trying to make the tests run in parallel I used a file lock - so shared and exclusive. There were a lot of issues with that, hence switching to 2 phases in the test, parallel then sequential but I kept the "exclusive" tag for want of a better word. Perhaps the notion of an exclusive PMU existed previously but maybe I've accidentally invented the term by way of a failed file lock experiment :-) Presumably the two PMUs side-effecting each other is a known thing. I wonder if we can capture this in the documentation. When you say "normal PMU event" you mean core PMU events? Thanks, Ian