On Mon, Jul 15, 2024 at 2:41 PM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote: > > > > On 2024-07-15 4:11 p.m., Ian Rogers wrote: > > On Mon, Jul 15, 2024 at 1:05 PM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote: > >> > >> Hi Ian, > >> > >> On 2024-07-10 12:59 a.m., kernel test robot wrote: > >>> > >>> > >>> Hello, > >>> > >>> kernel test robot noticed "perf-sanity-tests.perf_all_PMU_test.fail" on: > >>> > >>> commit: e2641db83f18782f57a0e107c50d2d1731960fb8 ("perf vendor events: Add/update skylake events/metrics") > >>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > >>> > >>> [test failed on linux-next/master 82d01fe6ee52086035b201cfa1410a3b04384257] > >>> > >>> in testcase: perf-sanity-tests > >>> version: > >>> with following parameters: > >>> > >>> perf_compiler: gcc > >>> > >>> > >>> > >>> compiler: gcc-13 > >>> test machine: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory > >>> > >>> (please refer to attached dmesg/kmsg for entire log/backtrace) > >>> > >>> > >>> we also observed two cases which also failed on parent can pass on this commit. > >>> FYI. > >>> > >>> > >>> caccae3ce7b988b6 e2641db83f18782f57a0e107c50 > >>> ---------------- --------------------------- > >>> fail:runs %reproduction fail:runs > >>> | | | > >>> :6 100% 6:6 perf-sanity-tests.perf_all_PMU_test.fail > >>> :6 100% 6:6 perf-sanity-tests.perf_all_metricgroups_test.pass > >>> :6 100% 6:6 perf-sanity-tests.perf_all_metrics_test.pass > >>> > >>> > >>> > >>> > >>> > >>> If you fix the issue in a separate patch/commit (i.e. not just a new version of > >>> the same patch/commit), kindly add following tags > >>> | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > >>> | Closes: https://lore.kernel.org/oe-lkp/202407101021.2c8baddb-oliver.sang@xxxxxxxxx > >>> > >>> > >>> > >>> 2024-07-09 07:09:53 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 105 > >>> 105: perf all metricgroups test : Ok > >>> 2024-07-09 07:10:11 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 106 > >>> 106: perf all metrics test : Ok > >>> 2024-07-09 07:10:23 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 107 > >>> 107: perf all libpfm4 events test : Ok > >>> 2024-07-09 07:10:47 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 108 > >>> 108: perf all PMU test : FAILED! > >>> > >> > >> The failure is caused by the below change in the e2641db83f18. > >> > >> + { > >> + "BriefDescription": "This 48-bit fixed counter counts the UCLK > >> cycles", > >> + "Counter": "FIXED", > >> + "EventCode": "0xff", > >> + "EventName": "UNC_CLOCK.SOCKET", > >> + "PerPkg": "1", > >> + "PublicDescription": "This 48-bit fixed counter counts the UCLK > >> cycles.", > >> + "Unit": "cbox_0" > >> } > >> > >> The other cbox events have the unit name "CBOX", while the fixed counter > >> has a unit name "cbox_0". So the events_table will maintain separate > >> entries for cbox and cbox_0. > >> > >> The perf_pmus__print_pmu_events() calculate the total number of events, > >> allocate an aliases buffer, store all the events into the buffer, sort, > >> and print all the aliases one by one. > >> > >> The problem is that the calculated total number of events doesn't match > >> the stored events on the SKL machine. > >> > >> The perf_pmu__num_events() is used to calculate the number of events. It > >> invokes the pmu_events_table__num_events() to go through the entire > >> events_table to find all events. Because of the > >> pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So > >> the events for cbox and cbox_0 are all counted. > >> > >> When storing events into the aliases buffer, the > >> perf_pmu__for_each_event() only process the events for cbox. > >> > >> Since a bigger buffer was allocated, the last entry are all 0. > >> When printing all the aliases, null will be outputed. > >> > >> $ perf list pmu > >> > >> List of pre-defined events (to be used in -e or -M): > >> > >> (null) [Kernel PMU event] > >> branch-instructions OR cpu/branch-instructions/ [Kernel PMU event] > >> branch-misses OR cpu/branch-misses/ [Kernel PMU event] > >> > >> > >> I'm thinking of two ways to address it. > >> One is to only print all the stored events. The below patch can fix it. > >> > >> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c > >> index 3fcabfd8fca1..2b2f5117ff84 100644 > >> --- a/tools/perf/util/pmus.c > >> +++ b/tools/perf/util/pmus.c > >> @@ -485,6 +485,7 @@ void perf_pmus__print_pmu_events(const struct > >> print_callbacks *print_cb, void *p > >> perf_pmu__for_each_event(pmu, skip_duplicate_pmus, &state, > >> perf_pmus__print_pmu_events__callback); > >> } > >> + len = state.index; > >> qsort(aliases, len, sizeof(struct sevent), cmp_sevent); > >> for (int j = 0; j < len; j++) { > >> /* Skip duplicates */ > >> > >> The only drawback is that perf list will not show the new cbox_0 event. > >> (But the event name still works. Users can still apply perf stat -e > >> unc_clock.socket.) > >> > >> Since the cbox_0 event is only available on old machines (SKL and > >> earlier), people should already use the equivalent kernel event. It > >> doesn't sounds a big issue for me. I prefer this simple fix. > >> > >> I think the other way would be to modify the perf_pmu__for_each_event() > >> to go through all the possible PMUs. > >> It seems complicated and may impact others ARCHs (e.g., S390). I haven't > >> tried it yet. > >> > >> What do you think? > >> Do you see any other ways to address the issue? > > > > Ugh. It seems the sizing and then iterating approach is just prone to > > keep breaking. Perhaps we can switch to realloc-ed arrays to avoid the > > need for perf_pmu__num_events, which seems to be the source of the > > problems. > > > > I think a realloc-ed array should have the same drawback as the first > way, but bad performance. > Because the pmu_add_cpu_aliases() in the perf_pmu__for_each_event() only > add the events from the first matched PMU. If we don't fix it, the > UNC_CLOCK.SOCKET of cbox_0 will never be displayed. Ok, but I don't think we need to optimize `perf list` for speed. Fwiw, I think this was the fix for the last bug in this code: https://lore.kernel.org/r/20240511003601.2666907-1-irogers@xxxxxxxxxx Thanks, Ian