Re: [linux-next:master] [pipe_read] aaec5a95d5: stress-ng.poll.ops_per_sec 11.1% regression

Oliver Sang <oliver.sang@xxxxxxxxx> · Wed, 22 Jan 2025 16:43:10 +0800

hi, Oleg,

On Mon, Jan 20, 2025 at 04:50:10PM +0100, Oleg Nesterov wrote:
> Again, I'll try to take another look tomorrow. Not sure I will find the
> explanation though...
> 
> But can you help? I know nothing about stress-ng.
> 
> Google finds a lot of stress-ng repositories, I've clone the 1st one
> https://github.com/ColinIanKing/stress-ng/blob/master/stress-poll.c
> hopefully this is what you used.

yes, this is the one we used.
(you could see it in https://github.com/intel/lkp-tests/blob/master/programs/stress-ng/pkg/PKGBUILD)

> 
> On 01/20, kernel test robot wrote:
> >
> >       9.45            -6.3        3.13 ±  9%  perf-profile.calltrace.cycles-pp.pipe_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > ...
> >      10.00            -6.5        3.53 ±  9%  perf-profile.children.cycles-pp.pipe_read
> >       2.34            -1.3        1.07 ±  9%  perf-profile.children.cycles-pp.pipe_poll
> 
> Could you explain what do these numbers mean and how there are calculated?
> 
> "git-grep cycles-pp" find nothing in stress-ng/ and tools/perf/

we use perf. the perf-profile is a so-callded 'monitor' name.
https://github.com/intel/lkp-tests/blob/master/monitors/no-stdout/perf-profile
this link shows how this 'monitor' runs.

I attached a perf-profile.gz in one run FYI.
within it, you can see something like "-e cycles:pp"

then for each run, all data will be parsed by
https://github.com/intel/lkp-tests/blob/master/programs/perf-profile/parse

for each commit, we run the test at least 6 times, then comparison list could
give avg and @stddev like
      2.34            -1.3        1.07 ±  9%  perf-profile****

> 
> > kernel test robot noticed a 11.1% regression of stress-ng.poll.ops_per_sec on:
> 
> same for ops_per_sec

this the kpi for this stress-ng.poll test. the raw output looks like:

2025-01-17 00:02:55 stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --poll 224
stress-ng: info:  [6458] setting to a 1 min run per stressor
stress-ng: info:  [6458] dispatching hogs: 224 poll
stress-ng: info:  [6458] note: /proc/sys/kernel/sched_autogroup_enabled is 1 and this can impact scheduling throughput for processes not attached to a tty. Setting this to 0 may improve performance metrics
stress-ng: metrc: [6458] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [6458]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [6458] poll          776040080     60.00    451.84   1488.52  12933899.40      399947.20        14.44          1844
stress-ng: info:  [6458] for a 60.08s run time:
stress-ng: info:  [6458]   13457.69s available CPU time
stress-ng: info:  [6458]     451.85s user time   (  3.36%)
stress-ng: info:  [6458]    1488.58s system time ( 11.06%)
stress-ng: info:  [6458]    1940.43s total time  ( 14.42%)
stress-ng: info:  [6458] load average: 44.46 13.42 4.66
stress-ng: info:  [6458] skipped: 0
stress-ng: info:  [6458] passed: 224: poll (224)
stress-ng: info:  [6458] failed: 0
stress-ng: info:  [6458] metrics untrustworthy: 0
stress-ng: info:  [6458] successful run completed in 1 min

we parse this data by
https://github.com/intel/lkp-tests/blob/master/programs/stress-ng/parse
then got the stress-ng.poll.ops_per_sec for this run from
bogo ops/s (real time)  --  12933899.40

again, below data is the avg for multi-runs:
  14516970           -11.1%   12907569        stress-ng.poll.ops_per_sec
(no %stddev since we won't show it if it's < 3%, which means stable enough)

> 
> >       6150           -47.8%       3208        stress-ng.time.percent_of_cpu_this_job_got
> 
> same for percent_of_cpu_this_job_got
> 
> >       2993           -50.6%       1477        stress-ng.time.system_time
> >     711.20           -36.0%     454.85        stress-ng.time.user_time
> 
> Is that what I think it is?? Does it run faster?

these time data are got by
https://github.com/intel/lkp-tests/blob/master/tests/wrapper#L38

below is a raw data FYI.

        Command being timed: "/lkp/lkp/src/programs/stress-ng/run"
        User time (seconds): 451.86
        System time (seconds): 1488.79
        Percent of CPU this job got: 3222%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:00.21  <---- (1)
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 222324
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 87235
        Voluntary context switches: 194087227
        Involuntary context switches: 78364
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

> 
> Or it exits after some timeout and the decrease in system/user_time can be
> explained by the change in the mysterious 'percent_of_cpu_this_job_got' above?

above (1) seems expected, we run the test 60s.

from the data you spot out, it seems stress-ng test itself get less cpu time
to run, which may explain why it become slow.

this reminds us maybe your commit about fs/pipe could have some impacts on our
'monitors'. so we rerun the tests by disabling all monitors. but still see
similar regression.
(below is the full list of compasison, as you can see, it's much shorter than
in our original report, since now 'monitors' are disabled, so no monitor data.)

=========================================================================================
compiler/cpufreq_governor/debug-setup/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/no-monitor/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/poll/stress-ng/60s

d2fc0ed52a284a13 aaec5a95d59615523db03dd53c2
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
    105111 ± 25%     -58.9%      43248 ± 10%  time.involuntary_context_switches
      6036           -46.1%       3253        time.percent_of_cpu_this_job_got
      2901           -48.7%       1489        time.system_time
    734.52           -36.0%     469.77        time.user_time
 4.472e+08           -55.5%   1.99e+08        time.voluntary_context_switches
 8.817e+08            -9.8%  7.957e+08        stress-ng.poll.ops
  14694617            -9.8%   13261019        stress-ng.poll.ops_per_sec
    105111 ± 25%     -58.9%      43248 ± 10%  stress-ng.time.involuntary_context_switches
      6036           -46.1%       3253        stress-ng.time.percent_of_cpu_this_job_got
      2901           -48.7%       1489        stress-ng.time.system_time
    734.52           -36.0%     469.77        stress-ng.time.user_time
 4.472e+08           -55.5%   1.99e+08        stress-ng.time.voluntary_context_switches

> 
> Oleg.
> 
Attachment:
perf-profile.gz

Description: application/gzip