Hi Namhyung, On 10/14/22 3:16 PM, Namhyung Kim wrote: > Hello, > > On Thu, Sep 22, 2022 at 12:10 AM Dongli Zhang <dongli.zhang@xxxxxxxxxx> wrote: >> >> Add kvm@xxxxxxxxxxxxxxx as this issue is in virtualization env. >> >> The topdown metrics events became default since >> commit 42641d6f4d15 ("perf stat: Add Topdown metrics events as default >> events"). The perf will use 'slots' if the >> /sys/bus/event_source/devices/cpu/events/slots is available. >> >> Unfortunately, the 'slots' may not be supported in the virualization >> environment. The hypervisor may not expose the 'slots' counter to the VM >> in cpuid. As a result, the kernel may disable topdown slots and metrics >> events in intel_pmu_init() if slots event is not in CPUID. E.g., both >> c->weight and c->idxmsk64 are set to 0. >> >> There will be below error on Icelake VM since 'slots' is the leader: >> >> $ perf stat >> Error: >> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (slots). >> /bin/dmesg | grep -i perf may provide additional information. >> >> This is because the stat_handle_error() returns COUNTER_FATAL when the >> 'slots' is used as leader of events. >> >> There are three options to fix the issue. >> >> 1. Do not expose /sys/bus/event_source/devices/cpu/events/slots to >> userspace so that pmu_have_event(pmu->name, "slots") returns false. >> >> 2. Run cpuid at perf userspace and avoid using 'slots' if it is not >> supported in cpuid. >> >> 3. Do not fatal perf if the leader is failed. Do not create events for an >> evsel if its leader is already failed. >> >> This RFC patch is with the 3rd option. Would you mind suggesting which >> option is better? > > Sorry for the late reply but I think option 1 is the way to go. > > The option 3 would be a transient workaround and it would affect > other events too. If it's really needed, I think option 2 is slightly better > than option 3. Or, we can add --force option to skip non-supported > events explicitly. About option 2 and 3, I will wait for more comments if anyone still prefers the change at userspace, e.g., the "--force" option to avoid failure but just to report non-supported. I have already sent a version that may impact other events. Please ignore as you think option 1 is enough. https://lore.kernel.org/all/20221010050113.13050-1-dongli.zhang@xxxxxxxxxx/ About option 1, there is a bugfix from Ken pending for review. https://lore.kernel.org/all/20220922201505.2721654-1-kan.liang@xxxxxxxxxxxxxxx/ Thank you very much! Dongli Zhang > > Thanks, > Namhyung > >> >> Here is the output of patch. >> >> $ perf stat -v >> Using CPUID GenuineIntel-6-6A-6 >> slots -> cpu/event=0,umask=0x4/ >> topdown-retiring -> cpu/event=0,umask=0x80/ >> topdown-bad-spec -> cpu/event=0,umask=0x81/ >> topdown-fe-bound -> cpu/event=0,umask=0x82/ >> topdown-be-bound -> cpu/event=0,umask=0x83/ >> Control descriptor is not initialized >> Warning: >> slots event is not supported by the kernel. >> ^Ccpu-clock: 62021481051 62021480237 62021480237 >> context-switches: 437 62021478064 62021478064 >> cpu-migrations: 17 62021475294 62021475294 >> page-faults: 12 62021471925 62021471925 >> cycles: 15662273 62020909141 62020909141 >> instructions: 6580385 62008944246 62008944246 >> branches: 1446119 62008855550 62008855550 >> branch-misses: 30970 62008643255 62008643255 >> failed to read counter slots >> failed to read counter topdown-retiring >> failed to read counter topdown-bad-spec >> failed to read counter topdown-fe-bound >> failed to read counter topdown-be-bound >> >> Performance counter stats for 'system wide': >> >> 62,021.48 msec cpu-clock # 16.006 CPUs utilized >> 437 context-switches # 7.046 /sec >> 17 cpu-migrations # 0.274 /sec >> 12 page-faults # 0.193 /sec >> 15,662,273 cycles # 0.000 GHz >> 6,580,385 instructions # 0.42 insn per cycle >> 1,446,119 branches # 23.316 K/sec >> 30,970 branch-misses # 2.14% of all branches >> <not supported> slots >> <not supported> topdown-retiring >> <not supported> topdown-bad-spec >> <not supported> topdown-fe-bound >> <not supported> topdown-be-bound >> >> 3.874991326 seconds time elapsed >> >> Thank you very much! >> >> Cc: Joe Jin <joe.jin@xxxxxxxxxx> >> Signed-off-by: Dongli Zhang <dongli.zhang@xxxxxxxxxx> >> --- >> tools/perf/builtin-stat.c | 8 +++++--- >> 1 file changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c >> index 0b4a62e4ff67..1053cf0886c0 100644 >> --- a/tools/perf/builtin-stat.c >> +++ b/tools/perf/builtin-stat.c >> @@ -762,9 +762,7 @@ static enum counter_recovery stat_handle_error(struct evsel *counter) >> */ >> counter->errored = true; >> >> - if ((evsel__leader(counter) != counter) || >> - !(counter->core.leader->nr_members > 1)) >> - return COUNTER_SKIP; >> + return COUNTER_SKIP; >> } else if (evsel__fallback(counter, errno, msg, sizeof(msg))) { >> if (verbose > 0) >> ui__warning("%s\n", msg); >> @@ -843,6 +841,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx) >> if (target.use_bpf) >> break; >> >> + if (evsel__leader(counter)->errored) >> + continue; >> if (counter->reset_group || counter->errored) >> continue; >> if (evsel__is_bpf(counter)) >> @@ -901,6 +901,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx) >> evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity) { >> counter = evlist_cpu_itr.evsel; >> >> + if (evsel__leader(counter)->errored) >> + continue; >> if (!counter->reset_group && !counter->errored) >> continue; >> if (!counter->reset_group) >> -- >> 2.17.1 >>