On 8/04/22 06:56, Ian Rogers wrote: > If all_cpus is calculated it represents the merge/union of all > evsel cpu maps. By default user_requested_cpus is computed to be > the online CPUs. For uncore events, it is often the case currently > that all_cpus is a subset of user_requested_cpus. Metrics printed > without aggregation and with metric-only, in print_no_aggr_metric, > iterate over user_requested_cpus assuming every CPU has a metric to > print. For each CPU the prefix is printed, but then if the > evsel's cpus doesn't contain anything you get an empty line like > the following on a 2 socket 36 core SkylakeX: > > ``` > $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000 > 1.000453137 CPU0 0.00 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 CPU18 0.00 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 2.003717143 CPU0 0.00 > ... > ``` > > While it is possible to be lazier in printing the prefix and > trailing newline, having user_requested_cpus not be a subset of > all_cpus is preferential so that wasted work isn't done elsewhere > user_requested_cpus is used. The change modifies user_requested_cpus > to be the intersection of user specified CPUs, or default all online > CPUs, with the CPUs computed through the merge of all evsel cpu maps. > > New behavior: > ``` > $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000 > 1.001086325 CPU0 0.00 > 1.001086325 CPU18 0.00 > 2.003671291 CPU0 0.00 > 2.003671291 CPU18 0.00 > ... > ``` > > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx> > --- > tools/perf/util/evlist.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c > index 52ea004ba01e..196d57b905a0 100644 > --- a/tools/perf/util/evlist.c > +++ b/tools/perf/util/evlist.c > @@ -1036,6 +1036,13 @@ int evlist__create_maps(struct evlist *evlist, struct target *target) > if (!cpus) > goto out_delete_threads; > > + if (evlist->core.all_cpus) { > + struct perf_cpu_map *tmp; > + > + tmp = perf_cpu_map__intersect(cpus, evlist->core.all_cpus); Isn't an uncore PMU represented as being on CPU0 actually collecting data that can be due to any CPU. Or for an uncore PMU represented as being on CPU0-CPU4 on a 4 core 8 hyperthread processor, actually 1 PMU per core. So I am not sure intersection makes sense. Also it is not obvious what happens with hybrid CPUs or per thread recording. > + perf_cpu_map__put(cpus); > + cpus = tmp; > + } > evlist->core.has_user_cpus = !!target->cpu_list && !target->hybrid; > > perf_evlist__set_maps(&evlist->core, cpus, threads);