Re: [PATCH 5/5] perf docs: arm_spe: Document new discard mode

Ian Rogers <irogers@xxxxxxxxxx> · Wed, 18 Dec 2024 11:47:09 -0800

On Wed, Dec 18, 2024 at 2:07 AM James Clark <james.clark@xxxxxxxxxx> wrote:
>
> On 18/12/2024 12:54 am, Ian Rogers wrote:
> > On Tue, Dec 17, 2024 at 3:56 AM James Clark <james.clark@xxxxxxxxxx> wrote:
> >>
> >> Document the flag, hint what it's used for and give an example with
> >> other useful options to get minimal output.
> >>
> >> Signed-off-by: James Clark <james.clark@xxxxxxxxxx>
> >> ---
> >>   tools/perf/Documentation/perf-arm-spe.txt | 11 +++++++++++
> >>   1 file changed, 11 insertions(+)
> >>
> >> diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
> >> index de2b0b479249..588eead438bc 100644
> >> --- a/tools/perf/Documentation/perf-arm-spe.txt
> >> +++ b/tools/perf/Documentation/perf-arm-spe.txt
> >> @@ -150,6 +150,7 @@ arm_spe/load_filter=1,min_latency=10/'
> >>     pct_enable=1        - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
> >>     store_filter=1      - collect stores only (PMSFCR.ST)
> >>     ts_enable=1         - enable timestamping with value of generic timer (PMSCR.TS)
> >> +  discard=1           - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
> >>
> >>   +++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
> >>   than only the execution latency.
> >> @@ -220,6 +221,16 @@ Common errors
> >>
> >>      Increase sampling interval (see above)
> >>
> >> +Discard mode
> >> +~~~~~~~~~~~~
> >> +
> >> +SPE PMU events can be used without the overhead of collecting sample data if
> >> +discard mode is supported (optional from Armv8.6). First run a system wide SPE
> >> +session (or on the core of interest) using options to minimize output. Then run
> >> +perf stat:
> >> +
> >> +  perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
> >> +  perf stat -e SAMPLE_FEED_LD
> >
> > Perhaps clarify this should be an ARM SPE event? It seems strange to
> > have one perf command affect a later one, the purpose of things like
> > event multiplexing is to hide the hardware limits. I'd prefer if the
> > last bit was like:
> > ```
> > Then run perf stat with an SPE event on the same PMU:
> >
> > perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
> > perf stat -e arm_spe/SAMPLE_FEED_LD/
> > ``
> >
> > Thanks,
> > Ian
>
> Hi Ian,
>
> Confusingly this isn't an SPE event, it is a normal PMU event. The fact
> that one Perf command affects the other is because these events only
> count when SPE is enabled. When it's enabled it has an effect on a
> per-core level which is why in the example I made it simpler by enabling
> SPE system wide.
>
> SPE is an exclusive PMU like Coresight and some others so it can't be
> affected by multiplexing or anything like that. The SAMPLE_FEED_LD PMU
> would be, but as long as SPE stays enabled it will count the right thing
> regardless of multiplexing.

Thanks James, sorry for my SPE ignorance. I'm smiling about the use of
the word exclusive. When I was trying to make the tests run in
parallel I used a file lock - so shared and exclusive. There were a
lot of issues with that, hence switching to 2 phases in the test,
parallel then sequential but I kept the "exclusive" tag for want of a
better word. Perhaps the notion of an exclusive PMU existed previously
but maybe I've accidentally invented the term by way of a failed file
lock experiment :-)

Presumably the two PMUs side-effecting each other is a known thing. I
wonder if we can capture this in the documentation. When you say
"normal PMU event" you mean core PMU events?

Thanks,
Ian