Hello, kernel test robot noticed a 12.6% regression of stress-ng.fstat.ops_per_sec on: commit: 16ecd47cb0cd895c7c2f5dd5db50f6c005c51639 ("pidfs: lookup pid through rbtree") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master [test failed on linus/master aa22f4da2a46b484a257d167c67a2adc1b7aaf68] [test failed on linux-next/master 5ffa57f6eecefababb8cbe327222ef171943b183] testcase: stress-ng config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: nr_threads: 100% disk: 1HDD testtime: 60s fs: btrfs test: fstat cpufreq_governor: performance In addition to that, the commit also has significant impact on the following tests: +------------------+---------------------------------------------------------------------------------------------+ | testcase: change | stress-ng: stress-ng.pthread.ops_per_sec 23.7% regression | | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory | | test parameters | cpufreq_governor=performance | | | nr_threads=100% | | | test=pthread | | | testtime=60s | +------------------+---------------------------------------------------------------------------------------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202501272257.a95372bc-lkp@xxxxxxxxx Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250127/202501272257.a95372bc-lkp@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/1HDD/btrfs/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/fstat/stress-ng/60s commit: 59a42b0e78 ("selftests/pidfd: add pidfs file handle selftests") 16ecd47cb0 ("pidfs: lookup pid through rbtree") 59a42b0e78888e2d 16ecd47cb0cd895c7c2f5dd5db5 ---------------- --------------------------- %stddev %change %stddev \ | \ 2813179 ± 2% -30.7% 1948548 cpuidle..usage 7.22 -6.8% 6.73 ± 2% iostat.cpu.user 0.38 -0.0 0.33 mpstat.cpu.all.irq% 5683055 ± 5% -13.3% 4926006 ± 10% numa-meminfo.node1.Active 5683055 ± 5% -13.3% 4926006 ± 10% numa-meminfo.node1.Active(anon) 681017 -13.0% 592632 vmstat.system.cs 262754 -8.6% 240105 vmstat.system.in 25349297 -14.3% 21728755 numa-numastat.node0.local_node 25389508 -14.3% 21770830 numa-numastat.node0.numa_hit 26719069 -14.2% 22919085 numa-numastat.node1.local_node 26746344 -14.2% 22943171 numa-numastat.node1.numa_hit 25391110 -14.3% 21771814 numa-vmstat.node0.numa_hit 25350899 -14.3% 21729738 numa-vmstat.node0.numa_local 1423040 ± 5% -13.3% 1233884 ± 10% numa-vmstat.node1.nr_active_anon 1423039 ± 5% -13.3% 1233883 ± 10% numa-vmstat.node1.nr_zone_active_anon 26748443 -14.2% 22948826 numa-vmstat.node1.numa_hit 26721168 -14.2% 22924740 numa-vmstat.node1.numa_local 4274794 -12.6% 3735109 stress-ng.fstat.ops 71246 -12.6% 62251 stress-ng.fstat.ops_per_sec 13044663 -10.2% 11715455 stress-ng.time.involuntary_context_switches 4590 -2.1% 4492 stress-ng.time.percent_of_cpu_this_job_got 2545 -1.6% 2503 stress-ng.time.system_time 212.55 -8.2% 195.17 ± 2% stress-ng.time.user_time 6786385 -12.7% 5924000 stress-ng.time.voluntary_context_switches 9685654 ± 2% +15.2% 11161628 ± 2% sched_debug.cfs_rq:/.avg_vruntime.avg 4917374 ± 6% +26.4% 6217585 ± 8% sched_debug.cfs_rq:/.avg_vruntime.min 9685655 ± 2% +15.2% 11161628 ± 2% sched_debug.cfs_rq:/.min_vruntime.avg 4917374 ± 6% +26.4% 6217586 ± 8% sched_debug.cfs_rq:/.min_vruntime.min 319.78 ± 4% -8.9% 291.47 ± 4% sched_debug.cfs_rq:/.util_avg.stddev 331418 -12.3% 290724 sched_debug.cpu.nr_switches.avg 349777 -12.0% 307943 sched_debug.cpu.nr_switches.max 247719 ± 5% -18.2% 202753 ± 2% sched_debug.cpu.nr_switches.min 1681668 -5.8% 1584232 proc-vmstat.nr_active_anon 2335388 -4.2% 2237095 proc-vmstat.nr_file_pages 1434429 -6.9% 1336146 proc-vmstat.nr_shmem 50745 -2.5% 49497 proc-vmstat.nr_slab_unreclaimable 1681668 -5.8% 1584232 proc-vmstat.nr_zone_active_anon 52137742 -14.2% 44716504 proc-vmstat.numa_hit 52070256 -14.2% 44650343 proc-vmstat.numa_local 57420831 -13.4% 49744871 proc-vmstat.pgalloc_normal 54983559 -13.7% 47445719 proc-vmstat.pgfree 1.30 -10.6% 1.17 perf-stat.i.MPKI 2.797e+10 -7.0% 2.6e+10 perf-stat.i.branch-instructions 0.32 ± 4% +0.0 0.33 perf-stat.i.branch-miss-rate% 24.15 -1.1 23.00 perf-stat.i.cache-miss-rate% 1.689e+08 -17.1% 1.401e+08 perf-stat.i.cache-misses 6.99e+08 -12.9% 6.085e+08 perf-stat.i.cache-references 708230 -12.7% 618047 perf-stat.i.context-switches 1.71 +8.2% 1.85 perf-stat.i.cpi 115482 -2.7% 112333 perf-stat.i.cpu-migrations 1311 +21.2% 1588 perf-stat.i.cycles-between-cache-misses 1.288e+11 -7.3% 1.195e+11 perf-stat.i.instructions 0.59 -7.4% 0.55 perf-stat.i.ipc 12.84 -11.0% 11.43 perf-stat.i.metric.K/sec 1.31 -10.5% 1.17 perf-stat.overall.MPKI 0.29 ± 4% +0.0 0.30 perf-stat.overall.branch-miss-rate% 24.21 -1.1 23.07 perf-stat.overall.cache-miss-rate% 1.71 +8.2% 1.85 perf-stat.overall.cpi 1303 +21.0% 1576 perf-stat.overall.cycles-between-cache-misses 0.58 -7.6% 0.54 perf-stat.overall.ipc 2.724e+10 -6.8% 2.539e+10 perf-stat.ps.branch-instructions 1.648e+08 -16.8% 1.371e+08 perf-stat.ps.cache-misses 6.807e+08 -12.7% 5.943e+08 perf-stat.ps.cache-references 689389 -12.5% 603372 perf-stat.ps.context-switches 1.255e+11 -7.0% 1.167e+11 perf-stat.ps.instructions 7.621e+12 -6.9% 7.097e+12 perf-stat.total.instructions 56.06 -56.1 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 56.04 -56.0 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 31.25 -31.2 0.00 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe 31.23 -31.2 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_exit.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe 31.22 -31.2 0.00 perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe 27.58 -27.6 0.00 perf-profile.calltrace.cycles-pp.exit_notify.do_exit.__x64_sys_exit.x64_sys_call.do_syscall_64 23.72 -23.7 0.00 perf-profile.calltrace.cycles-pp.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe 23.68 -23.7 0.00 perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe 20.15 -20.2 0.00 perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe 19.23 -19.2 0.00 perf-profile.calltrace.cycles-pp.fstatat64 16.51 -16.5 0.00 perf-profile.calltrace.cycles-pp.statx 14.81 -14.8 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fstatat64 14.52 -14.5 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64 14.52 -14.5 0.00 perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64 14.05 -14.0 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone3 14.04 -14.0 0.00 perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.exit_notify.do_exit.__x64_sys_exit.x64_sys_call 13.55 -13.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.exit_notify.do_exit.__x64_sys_exit 13.24 -13.2 0.00 perf-profile.calltrace.cycles-pp.release_task.exit_notify.do_exit.__x64_sys_exit.x64_sys_call 13.08 -13.1 0.00 perf-profile.calltrace.cycles-pp.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64 12.01 -12.0 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.statx 11.93 -11.9 0.00 perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.release_task.exit_notify.do_exit.__x64_sys_exit 11.76 -11.8 0.00 perf-profile.calltrace.cycles-pp.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64 11.72 -11.7 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx 11.45 -11.4 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.release_task.exit_notify.do_exit 10.27 -10.3 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx 7.21 -7.2 0.00 perf-profile.calltrace.cycles-pp.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe 5.25 -5.3 0.00 perf-profile.calltrace.cycles-pp.filename_lookup.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64 86.11 -86.1 0.00 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 85.52 -85.5 0.00 perf-profile.children.cycles-pp.do_syscall_64 41.40 -41.4 0.00 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 40.49 -40.5 0.00 perf-profile.children.cycles-pp.queued_write_lock_slowpath 31.57 -31.6 0.00 perf-profile.children.cycles-pp.x64_sys_call 31.23 -31.2 0.00 perf-profile.children.cycles-pp.do_exit 31.23 -31.2 0.00 perf-profile.children.cycles-pp.__x64_sys_exit 27.59 -27.6 0.00 perf-profile.children.cycles-pp.exit_notify 23.72 -23.7 0.00 perf-profile.children.cycles-pp.__do_sys_clone3 23.69 -23.7 0.00 perf-profile.children.cycles-pp.kernel_clone 20.18 -20.2 0.00 perf-profile.children.cycles-pp.copy_process 19.70 -19.7 0.00 perf-profile.children.cycles-pp.fstatat64 16.58 -16.6 0.00 perf-profile.children.cycles-pp.statx 13.51 -13.5 0.00 perf-profile.children.cycles-pp.__do_sys_newfstatat 13.25 -13.2 0.00 perf-profile.children.cycles-pp.release_task 12.22 -12.2 0.00 perf-profile.children.cycles-pp.vfs_fstatat 11.38 -11.4 0.00 perf-profile.children.cycles-pp.vfs_statx 10.36 -10.4 0.00 perf-profile.children.cycles-pp.__x64_sys_statx 8.25 -8.3 0.00 perf-profile.children.cycles-pp.filename_lookup 7.89 -7.9 0.00 perf-profile.children.cycles-pp.getname_flags 7.74 -7.7 0.00 perf-profile.children.cycles-pp.path_lookupat 41.39 -41.4 0.00 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath *************************************************************************************************** lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/pthread/stress-ng/60s commit: 59a42b0e78 ("selftests/pidfd: add pidfs file handle selftests") 16ecd47cb0 ("pidfs: lookup pid through rbtree") 59a42b0e78888e2d 16ecd47cb0cd895c7c2f5dd5db5 ---------------- --------------------------- %stddev %change %stddev \ | \ 6.458e+08 ± 3% -20.7% 5.119e+08 ± 6% cpuidle..time 4424460 ± 4% -56.5% 1923713 ± 2% cpuidle..usage 1916 +17.2% 2245 ± 2% vmstat.procs.r 880095 -24.7% 662885 vmstat.system.cs 717291 -7.6% 662983 vmstat.system.in 4.81 -0.9 3.87 ± 2% mpstat.cpu.all.idle% 0.48 -0.1 0.42 mpstat.cpu.all.irq% 0.32 ± 3% -0.1 0.26 ± 2% mpstat.cpu.all.soft% 1.77 -0.3 1.46 mpstat.cpu.all.usr% 43182538 -21.9% 33726626 numa-numastat.node0.local_node 43338607 -22.0% 33814109 numa-numastat.node0.numa_hit 43334202 -22.8% 33451907 numa-numastat.node1.local_node 43415892 -22.6% 33601910 numa-numastat.node1.numa_hit 43339112 -22.0% 33811967 numa-vmstat.node0.numa_hit 43183037 -21.9% 33724483 numa-vmstat.node0.numa_local 43416602 -22.6% 33599378 numa-vmstat.node1.numa_hit 43334912 -22.8% 33449374 numa-vmstat.node1.numa_local 13189 ± 14% -24.0% 10022 ± 19% perf-c2c.DRAM.local 9611 ± 16% -28.8% 6844 ± 17% perf-c2c.DRAM.remote 16436 ± 15% -32.1% 11162 ± 19% perf-c2c.HITM.local 4431 ± 16% -30.8% 3064 ± 19% perf-c2c.HITM.remote 20868 ± 15% -31.8% 14226 ± 19% perf-c2c.HITM.total 205629 +67.1% 343625 stress-ng.pthread.nanosecs_to_start_a_pthread 12690825 -23.7% 9689255 stress-ng.pthread.ops 210833 -23.7% 160924 stress-ng.pthread.ops_per_sec 5684649 -16.0% 4772378 stress-ng.time.involuntary_context_switches 26588792 -21.0% 20998281 stress-ng.time.minor_page_faults 12705 +5.1% 13353 stress-ng.time.percent_of_cpu_this_job_got 7559 +5.6% 7986 stress-ng.time.system_time 132.77 -24.1% 100.72 stress-ng.time.user_time 29099733 -22.3% 22601666 stress-ng.time.voluntary_context_switches 340547 +1.4% 345226 proc-vmstat.nr_mapped 150971 -3.2% 146184 proc-vmstat.nr_page_table_pages 48017 -2.0% 47078 proc-vmstat.nr_slab_reclaimable 540694 ± 9% +50.6% 814286 ± 15% proc-vmstat.numa_hint_faults 255145 ± 22% +62.3% 414122 ± 17% proc-vmstat.numa_hint_faults_local 86757062 -22.3% 67418409 proc-vmstat.numa_hit 86519300 -22.4% 67180920 proc-vmstat.numa_local 89935256 -22.2% 69939407 proc-vmstat.pgalloc_normal 27887502 -20.1% 22295448 proc-vmstat.pgfault 86343992 -22.7% 66777255 proc-vmstat.pgfree 1187131 ± 23% -42.2% 686568 ± 15% sched_debug.cfs_rq:/.avg_vruntime.stddev 12970740 ± 42% -49.3% 6577803 ± 11% sched_debug.cfs_rq:/.left_deadline.max 2408752 ± 4% -9.6% 2177658 ± 2% sched_debug.cfs_rq:/.left_deadline.stddev 12970554 ± 42% -49.3% 6577515 ± 11% sched_debug.cfs_rq:/.left_vruntime.max 2408688 ± 4% -9.6% 2177606 ± 2% sched_debug.cfs_rq:/.left_vruntime.stddev 1187132 ± 23% -42.2% 686568 ± 15% sched_debug.cfs_rq:/.min_vruntime.stddev 12970563 ± 42% -49.3% 6577516 ± 11% sched_debug.cfs_rq:/.right_vruntime.max 2408788 ± 4% -9.6% 2177610 ± 2% sched_debug.cfs_rq:/.right_vruntime.stddev 2096120 -68.2% 665792 sched_debug.cpu.curr->pid.max 655956 ± 8% -53.1% 307752 sched_debug.cpu.curr->pid.stddev 124008 -24.6% 93528 sched_debug.cpu.nr_switches.avg 270857 ± 4% -38.9% 165624 ± 10% sched_debug.cpu.nr_switches.max 27972 ± 13% -67.5% 9102 ± 17% sched_debug.cpu.nr_switches.stddev 179.43 ± 4% +17.8% 211.44 ± 4% sched_debug.cpu.nr_uninterruptible.stddev 4.21 -13.4% 3.65 perf-stat.i.MPKI 2.03e+10 -8.3% 1.863e+10 perf-stat.i.branch-instructions 0.66 -0.1 0.61 perf-stat.i.branch-miss-rate% 1.289e+08 -16.7% 1.074e+08 perf-stat.i.branch-misses 39.17 +0.7 39.92 perf-stat.i.cache-miss-rate% 3.806e+08 -21.8% 2.976e+08 perf-stat.i.cache-misses 9.691e+08 -23.3% 7.437e+08 perf-stat.i.cache-references 903142 -24.9% 678436 perf-stat.i.context-switches 6.89 +11.5% 7.69 perf-stat.i.cpi 6.239e+11 +1.0% 6.304e+11 perf-stat.i.cpu-cycles 311004 -18.5% 253387 perf-stat.i.cpu-migrations 1631 +29.1% 2106 perf-stat.i.cycles-between-cache-misses 9.068e+10 -9.7% 8.192e+10 perf-stat.i.instructions 0.15 -9.5% 0.14 perf-stat.i.ipc 10.41 -22.2% 8.11 perf-stat.i.metric.K/sec 462421 -19.7% 371144 perf-stat.i.minor-faults 668589 -21.0% 527974 perf-stat.i.page-faults 4.22 -13.6% 3.65 perf-stat.overall.MPKI 0.63 -0.1 0.57 perf-stat.overall.branch-miss-rate% 39.29 +0.7 40.04 perf-stat.overall.cache-miss-rate% 6.94 +11.7% 7.75 perf-stat.overall.cpi 1643 +29.3% 2125 perf-stat.overall.cycles-between-cache-misses 0.14 -10.5% 0.13 perf-stat.overall.ipc 1.971e+10 -8.6% 1.801e+10 perf-stat.ps.branch-instructions 1.237e+08 -17.2% 1.024e+08 perf-stat.ps.branch-misses 3.713e+08 -22.3% 2.887e+08 perf-stat.ps.cache-misses 9.451e+08 -23.7% 7.21e+08 perf-stat.ps.cache-references 883135 -25.3% 659967 perf-stat.ps.context-switches 304186 -18.9% 246645 perf-stat.ps.cpu-migrations 8.797e+10 -10.0% 7.916e+10 perf-stat.ps.instructions 445107 -20.6% 353509 perf-stat.ps.minor-faults 646755 -21.7% 506142 perf-stat.ps.page-faults 5.397e+12 -10.2% 4.846e+12 perf-stat.total.instructions Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki