On Fri, Feb 28, 2025 at 4:13 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote: > > Hello, > > On Fri, Feb 28, 2025 at 07:22:41AM +0100, Thomas Richter wrote: > > In tree linux-next > > the perf test case 114 'perf record tests' has a subtest > > named 'Basic leader sampling test' which always fails on s390. > > Root cause is this invocation > > > > # perf record -vv -e '{cycles,cycles}:Su' -- perf test -w brstack > > > > ... > > In the debug output the following 2 event are installed: > > > > ------------------------------------------------------------ > > perf_event_attr: > > type 0 (PERF_TYPE_HARDWARE) > > size 136 > > config 0 (PERF_COUNT_HW_CPU_CYCLES) > > { sample_period, sample_freq } 4000 > > sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER > > read_format ID|GROUP|LOST > > disabled 1 > > exclude_kernel 1 > > exclude_hv 1 > > freq 1 > > sample_id_all 1 > > ------------------------------------------------------------ > > sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5 > > ------------------------------------------------------------ > > perf_event_attr: > > type 0 (PERF_TYPE_HARDWARE) > > size 136 > > config 0 (PERF_COUNT_HW_CPU_CYCLES) > > sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER > > read_format ID|GROUP|LOST > > exclude_kernel 1 > > exclude_hv 1 > > sample_id_all 1 > > ------------------------------------------------------------ > > sys_perf_event_open: pid -1 cpu 0 group_fd 5 flags 0x8 = 6 > > ... > > > > The first event is the group leader and is installed as sampling event. > > The secound one is group member and is installed as counting event. > > > > Namhyung Kim confirms this observation: > > > Yep, the syntax '{event1,event2}:S' is for group leader sampling which > > > reduces the overhead of PMU interrupts. The idea is that those events > > > are scheduled together so sampling is enabled only for the leader > > > (usually the first) event and it reads counts from the member events > > > using PERF_SAMPLE_READ. > > > > > > So they should have the same counts if it uses the same events in a > > > group. > > > > However this does not work on s390. s390 has one dedicated sampling PMU > > which supports only one event. A different PMU is used for counting. > > Both run concurrently using different setups and frequencies. > > > > On s390x a sampling event is setup using a preset trigger and a large > > buffer. The hardware > > - writes a samples (64 bytes) into this buffer > > when a given number of CPU instructions has been executed. > > - and triggers an interrupt when the buffer gets full. > > The trigger has just a few possible values. > > > > On s390x the counting event cycles is used to read out the numer of > > CPU cycles executed. > > > > On s390 above invocation created 2 events executed on 2 different > > PMU and the result are diffent values from two independently running > > PMUs which do not match in a consistent and reliably as on Intel: > > > > # ./perf record -e '{cycles,cycles}:Su' -- perf test -w brstack Hi Thomas, Thanks for reporting this! Could you try adding --count=100000 so that we're not using frequency mode and we expect the counts to look like 100,000. For example, on my x86 laptop: ``` $ perf record --count=100000 -e '{cycles,cycles}:Su' -- perf test -w brstack [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.047 MB perf.data (712 samples) ] $ perf script perf 635952 290271.436115: 100007 cycles: ffffffffada00080 [unknown] ([unknown]) perf 635952 290271.436115: 100007 cycles: ffffffffada00080 [unknown] ([unknown]) perf 635952 290271.436650: 100525 cycles: 7f86352b01b3 _dl_map_object_from_fd+0x553 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) perf 635952 290271.436650: 100525 cycles: 7f86352b01b3 _dl_map_object_from_fd+0x553 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) perf 635952 290271.437088: 99866 cycles: 7f86352cb827 strchr+0x27 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) perf 635952 290271.437088: 99866 cycles: 7f86352cb827 strchr+0x27 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) perf 635952 290271.437376: 99912 cycles: 7f86352cba74 strcmp+0x54 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) perf 635952 290271.437376: 99912 cycles: 7f86352cba74 strcmp+0x54 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) perf 635952 290271.437509: 100279 cycles: 7f86352cba3a strcmp+0x1a (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) perf 635952 290271.437509: 100279 cycles: 7f86352cba3a strcmp+0x1a (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) perf 635952 290271.437559: 99760 cycles: 7f86352bc39f _dl_check_map_versions+0x50f (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) perf 635952 290271.437559: 99760 cycles: 7f86352bc39f _dl_check_map_versions+0x50f (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) ... ``` I'm particularly concerned if we see the cycles count very deviant from the 100000. > > ... > > # ./perf script > > perf 2799437 92568.845118: 5508000 cycles: 3ffbcb898b6 do_lookup_x+0x196 > > perf 2799437 92568.845119: 1377000 cycles: 3ffbcb898b6 do_lookup_x+0x196 > > perf 2799437 92568.845120: 4131000 cycles: 3ffbcb897e8 do_lookup_x+0xc8 > > perf 2799437 92568.845121: 1377000 cycles: 3ffbcb8a37c _dl_lookup_symbol > > perf 2799437 92568.845122: 1377000 cycles: 3ffbcb89558 check_match+0x18 > > perf 2799437 92568.845123: 2754000 cycles: 3ffbcb89b2a do_lookup_x+0x40a > > perf 2799437 92568.845124: 1377000 cycles: 3ffbcb89b1e do_lookup_x+0x3fe > > > > As can be seen the result match very often but not all the time > > make this test on s390 failing very, very often. Actually this is much more deviation than I'd expect. If we use task-clock softer/timer based event I see: ``` $ perf record --count=100000 -e '{task-clock,task-clock}:Su' -- perf test -w brstack [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.047 MB perf.data (712 samples) ] $ perf script perf 636643 290571.807049: 801858 task-clock: 7fdf48643439 _dl_map_object_from_fd+0x7d9 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) perf 636643 290571.807049: 804012 task-clock: 7fdf48643439 _dl_map_object_from_fd+0x7d9 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) perf 636643 290571.807549: 499833 task-clock: 7fdf4863eb9b _dl_map_object_deps+0x3eb (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) perf 636643 290571.807549: 498236 task-clock: 7fdf4863eb9b _dl_map_object_deps+0x3eb (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) ``` So the count deviates by a few hundred, but your output seems to deviate by 4 million. So, I think the test needs to be more tolerant that should help your case. As Namhyung mentions I think there may be another bug lurking. Thanks, Ian > > This patch bypasses this test on s390. > > > > Output before: > > # ./perf test 114 > > 114: perf record tests : FAILED! > > # > > > > Output after: > > # ./perf test 114 > > 114: perf record tests : Ok > > # > > > > Signed-off-by: Thomas Richter <tmricht@xxxxxxxxxxxxx> > > Acked-by: Sumanth Korikkar <sumanthk@xxxxxxxxxxxxx> > > Thanks for the fix. I think Ian saw the same problem on other archs > too. Maybe we need to enable it on supported archs only. > > Thanks, > Namhyung > > > --- > > tools/perf/tests/shell/record.sh | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh > > index ba8d873d3ca7..98b69820bc5f 100755 > > --- a/tools/perf/tests/shell/record.sh > > +++ b/tools/perf/tests/shell/record.sh > > @@ -231,6 +231,12 @@ test_cgroup() { > > > > test_leader_sampling() { > > echo "Basic leader sampling test" > > + if [ "$(uname -m)" = s390x ] > > + then > > + echo "Leader sampling skipped" > > + ((skipped+=1)) > > + return > > + fi > > if ! perf record -o "${perfdata}" -e "{cycles,cycles}:Su" -- \ > > perf test -w brstack 2> /dev/null > > then > > -- > > 2.45.2 > >