[linus:master] [pidfs] 16ecd47cb0: stress-ng.fstat.ops_per_sec 12.6% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 12.6% regression of stress-ng.fstat.ops_per_sec on:


commit: 16ecd47cb0cd895c7c2f5dd5db50f6c005c51639 ("pidfs: lookup pid through rbtree")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master      aa22f4da2a46b484a257d167c67a2adc1b7aaf68]
[test failed on linux-next/master 5ffa57f6eecefababb8cbe327222ef171943b183]

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	disk: 1HDD
	testtime: 60s
	fs: btrfs
	test: fstat
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+---------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.pthread.ops_per_sec 23.7% regression                                   |
| test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
| test parameters  | cpufreq_governor=performance                                                                |
|                  | nr_threads=100%                                                                             |
|                  | test=pthread                                                                                |
|                  | testtime=60s                                                                                |
+------------------+---------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202501272257.a95372bc-lkp@xxxxxxxxx


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250127/202501272257.a95372bc-lkp@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/1HDD/btrfs/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/fstat/stress-ng/60s

commit: 
  59a42b0e78 ("selftests/pidfd: add pidfs file handle selftests")
  16ecd47cb0 ("pidfs: lookup pid through rbtree")

59a42b0e78888e2d 16ecd47cb0cd895c7c2f5dd5db5 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   2813179 ±  2%     -30.7%    1948548        cpuidle..usage
      7.22            -6.8%       6.73 ±  2%  iostat.cpu.user
      0.38            -0.0        0.33        mpstat.cpu.all.irq%
   5683055 ±  5%     -13.3%    4926006 ± 10%  numa-meminfo.node1.Active
   5683055 ±  5%     -13.3%    4926006 ± 10%  numa-meminfo.node1.Active(anon)
    681017           -13.0%     592632        vmstat.system.cs
    262754            -8.6%     240105        vmstat.system.in
  25349297           -14.3%   21728755        numa-numastat.node0.local_node
  25389508           -14.3%   21770830        numa-numastat.node0.numa_hit
  26719069           -14.2%   22919085        numa-numastat.node1.local_node
  26746344           -14.2%   22943171        numa-numastat.node1.numa_hit
  25391110           -14.3%   21771814        numa-vmstat.node0.numa_hit
  25350899           -14.3%   21729738        numa-vmstat.node0.numa_local
   1423040 ±  5%     -13.3%    1233884 ± 10%  numa-vmstat.node1.nr_active_anon
   1423039 ±  5%     -13.3%    1233883 ± 10%  numa-vmstat.node1.nr_zone_active_anon
  26748443           -14.2%   22948826        numa-vmstat.node1.numa_hit
  26721168           -14.2%   22924740        numa-vmstat.node1.numa_local
   4274794           -12.6%    3735109        stress-ng.fstat.ops
     71246           -12.6%      62251        stress-ng.fstat.ops_per_sec
  13044663           -10.2%   11715455        stress-ng.time.involuntary_context_switches
      4590            -2.1%       4492        stress-ng.time.percent_of_cpu_this_job_got
      2545            -1.6%       2503        stress-ng.time.system_time
    212.55            -8.2%     195.17 ±  2%  stress-ng.time.user_time
   6786385           -12.7%    5924000        stress-ng.time.voluntary_context_switches
   9685654 ±  2%     +15.2%   11161628 ±  2%  sched_debug.cfs_rq:/.avg_vruntime.avg
   4917374 ±  6%     +26.4%    6217585 ±  8%  sched_debug.cfs_rq:/.avg_vruntime.min
   9685655 ±  2%     +15.2%   11161628 ±  2%  sched_debug.cfs_rq:/.min_vruntime.avg
   4917374 ±  6%     +26.4%    6217586 ±  8%  sched_debug.cfs_rq:/.min_vruntime.min
    319.78 ±  4%      -8.9%     291.47 ±  4%  sched_debug.cfs_rq:/.util_avg.stddev
    331418           -12.3%     290724        sched_debug.cpu.nr_switches.avg
    349777           -12.0%     307943        sched_debug.cpu.nr_switches.max
    247719 ±  5%     -18.2%     202753 ±  2%  sched_debug.cpu.nr_switches.min
   1681668            -5.8%    1584232        proc-vmstat.nr_active_anon
   2335388            -4.2%    2237095        proc-vmstat.nr_file_pages
   1434429            -6.9%    1336146        proc-vmstat.nr_shmem
     50745            -2.5%      49497        proc-vmstat.nr_slab_unreclaimable
   1681668            -5.8%    1584232        proc-vmstat.nr_zone_active_anon
  52137742           -14.2%   44716504        proc-vmstat.numa_hit
  52070256           -14.2%   44650343        proc-vmstat.numa_local
  57420831           -13.4%   49744871        proc-vmstat.pgalloc_normal
  54983559           -13.7%   47445719        proc-vmstat.pgfree
      1.30           -10.6%       1.17        perf-stat.i.MPKI
 2.797e+10            -7.0%    2.6e+10        perf-stat.i.branch-instructions
      0.32 ±  4%      +0.0        0.33        perf-stat.i.branch-miss-rate%
     24.15            -1.1       23.00        perf-stat.i.cache-miss-rate%
 1.689e+08           -17.1%  1.401e+08        perf-stat.i.cache-misses
  6.99e+08           -12.9%  6.085e+08        perf-stat.i.cache-references
    708230           -12.7%     618047        perf-stat.i.context-switches
      1.71            +8.2%       1.85        perf-stat.i.cpi
    115482            -2.7%     112333        perf-stat.i.cpu-migrations
      1311           +21.2%       1588        perf-stat.i.cycles-between-cache-misses
 1.288e+11            -7.3%  1.195e+11        perf-stat.i.instructions
      0.59            -7.4%       0.55        perf-stat.i.ipc
     12.84           -11.0%      11.43        perf-stat.i.metric.K/sec
      1.31           -10.5%       1.17        perf-stat.overall.MPKI
      0.29 ±  4%      +0.0        0.30        perf-stat.overall.branch-miss-rate%
     24.21            -1.1       23.07        perf-stat.overall.cache-miss-rate%
      1.71            +8.2%       1.85        perf-stat.overall.cpi
      1303           +21.0%       1576        perf-stat.overall.cycles-between-cache-misses
      0.58            -7.6%       0.54        perf-stat.overall.ipc
 2.724e+10            -6.8%  2.539e+10        perf-stat.ps.branch-instructions
 1.648e+08           -16.8%  1.371e+08        perf-stat.ps.cache-misses
 6.807e+08           -12.7%  5.943e+08        perf-stat.ps.cache-references
    689389           -12.5%     603372        perf-stat.ps.context-switches
 1.255e+11            -7.0%  1.167e+11        perf-stat.ps.instructions
 7.621e+12            -6.9%  7.097e+12        perf-stat.total.instructions
     56.06           -56.1        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
     56.04           -56.0        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
     31.25           -31.2        0.00        perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
     31.23           -31.2        0.00        perf-profile.calltrace.cycles-pp.__x64_sys_exit.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
     31.22           -31.2        0.00        perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
     27.58           -27.6        0.00        perf-profile.calltrace.cycles-pp.exit_notify.do_exit.__x64_sys_exit.x64_sys_call.do_syscall_64
     23.72           -23.7        0.00        perf-profile.calltrace.cycles-pp.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
     23.68           -23.7        0.00        perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
     20.15           -20.2        0.00        perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
     19.23           -19.2        0.00        perf-profile.calltrace.cycles-pp.fstatat64
     16.51           -16.5        0.00        perf-profile.calltrace.cycles-pp.statx
     14.81           -14.8        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fstatat64
     14.52           -14.5        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
     14.52           -14.5        0.00        perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64
     14.05           -14.0        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone3
     14.04           -14.0        0.00        perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.exit_notify.do_exit.__x64_sys_exit.x64_sys_call
     13.55           -13.6        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.exit_notify.do_exit.__x64_sys_exit
     13.24           -13.2        0.00        perf-profile.calltrace.cycles-pp.release_task.exit_notify.do_exit.__x64_sys_exit.x64_sys_call
     13.08           -13.1        0.00        perf-profile.calltrace.cycles-pp.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
     12.01           -12.0        0.00        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.statx
     11.93           -11.9        0.00        perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.release_task.exit_notify.do_exit.__x64_sys_exit
     11.76           -11.8        0.00        perf-profile.calltrace.cycles-pp.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
     11.72           -11.7        0.00        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
     11.45           -11.4        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.release_task.exit_notify.do_exit
     10.27           -10.3        0.00        perf-profile.calltrace.cycles-pp.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
      7.21            -7.2        0.00        perf-profile.calltrace.cycles-pp.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.25            -5.3        0.00        perf-profile.calltrace.cycles-pp.filename_lookup.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64
     86.11           -86.1        0.00        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     85.52           -85.5        0.00        perf-profile.children.cycles-pp.do_syscall_64
     41.40           -41.4        0.00        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     40.49           -40.5        0.00        perf-profile.children.cycles-pp.queued_write_lock_slowpath
     31.57           -31.6        0.00        perf-profile.children.cycles-pp.x64_sys_call
     31.23           -31.2        0.00        perf-profile.children.cycles-pp.do_exit
     31.23           -31.2        0.00        perf-profile.children.cycles-pp.__x64_sys_exit
     27.59           -27.6        0.00        perf-profile.children.cycles-pp.exit_notify
     23.72           -23.7        0.00        perf-profile.children.cycles-pp.__do_sys_clone3
     23.69           -23.7        0.00        perf-profile.children.cycles-pp.kernel_clone
     20.18           -20.2        0.00        perf-profile.children.cycles-pp.copy_process
     19.70           -19.7        0.00        perf-profile.children.cycles-pp.fstatat64
     16.58           -16.6        0.00        perf-profile.children.cycles-pp.statx
     13.51           -13.5        0.00        perf-profile.children.cycles-pp.__do_sys_newfstatat
     13.25           -13.2        0.00        perf-profile.children.cycles-pp.release_task
     12.22           -12.2        0.00        perf-profile.children.cycles-pp.vfs_fstatat
     11.38           -11.4        0.00        perf-profile.children.cycles-pp.vfs_statx
     10.36           -10.4        0.00        perf-profile.children.cycles-pp.__x64_sys_statx
      8.25            -8.3        0.00        perf-profile.children.cycles-pp.filename_lookup
      7.89            -7.9        0.00        perf-profile.children.cycles-pp.getname_flags
      7.74            -7.7        0.00        perf-profile.children.cycles-pp.path_lookupat
     41.39           -41.4        0.00        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


***************************************************************************************************
lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/pthread/stress-ng/60s

commit: 
  59a42b0e78 ("selftests/pidfd: add pidfs file handle selftests")
  16ecd47cb0 ("pidfs: lookup pid through rbtree")

59a42b0e78888e2d 16ecd47cb0cd895c7c2f5dd5db5 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 6.458e+08 ±  3%     -20.7%  5.119e+08 ±  6%  cpuidle..time
   4424460 ±  4%     -56.5%    1923713 ±  2%  cpuidle..usage
      1916           +17.2%       2245 ±  2%  vmstat.procs.r
    880095           -24.7%     662885        vmstat.system.cs
    717291            -7.6%     662983        vmstat.system.in
      4.81            -0.9        3.87 ±  2%  mpstat.cpu.all.idle%
      0.48            -0.1        0.42        mpstat.cpu.all.irq%
      0.32 ±  3%      -0.1        0.26 ±  2%  mpstat.cpu.all.soft%
      1.77            -0.3        1.46        mpstat.cpu.all.usr%
  43182538           -21.9%   33726626        numa-numastat.node0.local_node
  43338607           -22.0%   33814109        numa-numastat.node0.numa_hit
  43334202           -22.8%   33451907        numa-numastat.node1.local_node
  43415892           -22.6%   33601910        numa-numastat.node1.numa_hit
  43339112           -22.0%   33811967        numa-vmstat.node0.numa_hit
  43183037           -21.9%   33724483        numa-vmstat.node0.numa_local
  43416602           -22.6%   33599378        numa-vmstat.node1.numa_hit
  43334912           -22.8%   33449374        numa-vmstat.node1.numa_local
     13189 ± 14%     -24.0%      10022 ± 19%  perf-c2c.DRAM.local
      9611 ± 16%     -28.8%       6844 ± 17%  perf-c2c.DRAM.remote
     16436 ± 15%     -32.1%      11162 ± 19%  perf-c2c.HITM.local
      4431 ± 16%     -30.8%       3064 ± 19%  perf-c2c.HITM.remote
     20868 ± 15%     -31.8%      14226 ± 19%  perf-c2c.HITM.total
    205629           +67.1%     343625        stress-ng.pthread.nanosecs_to_start_a_pthread
  12690825           -23.7%    9689255        stress-ng.pthread.ops
    210833           -23.7%     160924        stress-ng.pthread.ops_per_sec
   5684649           -16.0%    4772378        stress-ng.time.involuntary_context_switches
  26588792           -21.0%   20998281        stress-ng.time.minor_page_faults
     12705            +5.1%      13353        stress-ng.time.percent_of_cpu_this_job_got
      7559            +5.6%       7986        stress-ng.time.system_time
    132.77           -24.1%     100.72        stress-ng.time.user_time
  29099733           -22.3%   22601666        stress-ng.time.voluntary_context_switches
    340547            +1.4%     345226        proc-vmstat.nr_mapped
    150971            -3.2%     146184        proc-vmstat.nr_page_table_pages
     48017            -2.0%      47078        proc-vmstat.nr_slab_reclaimable
    540694 ±  9%     +50.6%     814286 ± 15%  proc-vmstat.numa_hint_faults
    255145 ± 22%     +62.3%     414122 ± 17%  proc-vmstat.numa_hint_faults_local
  86757062           -22.3%   67418409        proc-vmstat.numa_hit
  86519300           -22.4%   67180920        proc-vmstat.numa_local
  89935256           -22.2%   69939407        proc-vmstat.pgalloc_normal
  27887502           -20.1%   22295448        proc-vmstat.pgfault
  86343992           -22.7%   66777255        proc-vmstat.pgfree
   1187131 ± 23%     -42.2%     686568 ± 15%  sched_debug.cfs_rq:/.avg_vruntime.stddev
  12970740 ± 42%     -49.3%    6577803 ± 11%  sched_debug.cfs_rq:/.left_deadline.max
   2408752 ±  4%      -9.6%    2177658 ±  2%  sched_debug.cfs_rq:/.left_deadline.stddev
  12970554 ± 42%     -49.3%    6577515 ± 11%  sched_debug.cfs_rq:/.left_vruntime.max
   2408688 ±  4%      -9.6%    2177606 ±  2%  sched_debug.cfs_rq:/.left_vruntime.stddev
   1187132 ± 23%     -42.2%     686568 ± 15%  sched_debug.cfs_rq:/.min_vruntime.stddev
  12970563 ± 42%     -49.3%    6577516 ± 11%  sched_debug.cfs_rq:/.right_vruntime.max
   2408788 ±  4%      -9.6%    2177610 ±  2%  sched_debug.cfs_rq:/.right_vruntime.stddev
   2096120           -68.2%     665792        sched_debug.cpu.curr->pid.max
    655956 ±  8%     -53.1%     307752        sched_debug.cpu.curr->pid.stddev
    124008           -24.6%      93528        sched_debug.cpu.nr_switches.avg
    270857 ±  4%     -38.9%     165624 ± 10%  sched_debug.cpu.nr_switches.max
     27972 ± 13%     -67.5%       9102 ± 17%  sched_debug.cpu.nr_switches.stddev
    179.43 ±  4%     +17.8%     211.44 ±  4%  sched_debug.cpu.nr_uninterruptible.stddev
      4.21           -13.4%       3.65        perf-stat.i.MPKI
  2.03e+10            -8.3%  1.863e+10        perf-stat.i.branch-instructions
      0.66            -0.1        0.61        perf-stat.i.branch-miss-rate%
 1.289e+08           -16.7%  1.074e+08        perf-stat.i.branch-misses
     39.17            +0.7       39.92        perf-stat.i.cache-miss-rate%
 3.806e+08           -21.8%  2.976e+08        perf-stat.i.cache-misses
 9.691e+08           -23.3%  7.437e+08        perf-stat.i.cache-references
    903142           -24.9%     678436        perf-stat.i.context-switches
      6.89           +11.5%       7.69        perf-stat.i.cpi
 6.239e+11            +1.0%  6.304e+11        perf-stat.i.cpu-cycles
    311004           -18.5%     253387        perf-stat.i.cpu-migrations
      1631           +29.1%       2106        perf-stat.i.cycles-between-cache-misses
 9.068e+10            -9.7%  8.192e+10        perf-stat.i.instructions
      0.15            -9.5%       0.14        perf-stat.i.ipc
     10.41           -22.2%       8.11        perf-stat.i.metric.K/sec
    462421           -19.7%     371144        perf-stat.i.minor-faults
    668589           -21.0%     527974        perf-stat.i.page-faults
      4.22           -13.6%       3.65        perf-stat.overall.MPKI
      0.63            -0.1        0.57        perf-stat.overall.branch-miss-rate%
     39.29            +0.7       40.04        perf-stat.overall.cache-miss-rate%
      6.94           +11.7%       7.75        perf-stat.overall.cpi
      1643           +29.3%       2125        perf-stat.overall.cycles-between-cache-misses
      0.14           -10.5%       0.13        perf-stat.overall.ipc
 1.971e+10            -8.6%  1.801e+10        perf-stat.ps.branch-instructions
 1.237e+08           -17.2%  1.024e+08        perf-stat.ps.branch-misses
 3.713e+08           -22.3%  2.887e+08        perf-stat.ps.cache-misses
 9.451e+08           -23.7%   7.21e+08        perf-stat.ps.cache-references
    883135           -25.3%     659967        perf-stat.ps.context-switches
    304186           -18.9%     246645        perf-stat.ps.cpu-migrations
 8.797e+10           -10.0%  7.916e+10        perf-stat.ps.instructions
    445107           -20.6%     353509        perf-stat.ps.minor-faults
    646755           -21.7%     506142        perf-stat.ps.page-faults
 5.397e+12           -10.2%  4.846e+12        perf-stat.total.instructions





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux