Hello, kernel test robot noticed a -3.2% regression of will-it-scale.per_process_ops on: commit: cb8c4312afca1b2dc64107e7e7cea81911055612 ("futex: Add sys_futex_wait()") https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git locking/core testcase: will-it-scale test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory parameters: nr_task: 16 mode: process test: futex4 cpufreq_governor: performance If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202310081429.a30c99f2-oliver.sang@xxxxxxxxx Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20231008/202310081429.a30c99f2-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/process/16/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/futex4/will-it-scale commit: 43adf84495 ("futex: FLAGS_STRICT") cb8c4312af ("futex: Add sys_futex_wait()") 43adf844951084c2 cb8c4312afca1b2dc64107e7e7c ---------------- --------------------------- %stddev %change %stddev \ | \ 1.339e+08 -3.2% 1.296e+08 will-it-scale.16.processes 8367312 -3.2% 8102637 will-it-scale.per_process_ops 1.339e+08 -3.2% 1.296e+08 will-it-scale.workload 0.61 -0.0 0.59 perf-stat.i.branch-miss-rate% 72599095 -2.7% 70647352 perf-stat.i.branch-misses 0.80 -1.8% 0.79 perf-stat.i.cpi 2.073e+10 +3.8% 2.152e+10 perf-stat.i.dTLB-loads 1.72e+10 +2.2% 1.757e+10 perf-stat.i.dTLB-stores 66739031 -5.4% 63102078 perf-stat.i.iTLB-load-misses 2080892 +2.4% 2131032 perf-stat.i.iTLB-loads 8.203e+10 +1.6% 8.337e+10 perf-stat.i.instructions 1231 +7.3% 1321 perf-stat.i.instructions-per-iTLB-miss 1.24 +1.8% 1.27 perf-stat.i.ipc 222.58 +2.4% 227.82 perf-stat.i.metric.M/sec 0.61 -0.0 0.59 perf-stat.overall.branch-miss-rate% 0.80 -1.8% 0.79 perf-stat.overall.cpi 1229 +7.5% 1321 perf-stat.overall.instructions-per-iTLB-miss 1.24 +1.8% 1.27 perf-stat.overall.ipc 184025 +4.9% 193123 perf-stat.overall.path-length 72373935 -2.7% 70427711 perf-stat.ps.branch-misses 2.066e+10 +3.8% 2.144e+10 perf-stat.ps.dTLB-loads 1.714e+10 +2.2% 1.751e+10 perf-stat.ps.dTLB-stores 66517376 -5.5% 62888454 perf-stat.ps.iTLB-load-misses 2073911 +2.4% 2123876 perf-stat.ps.iTLB-loads 8.175e+10 +1.6% 8.309e+10 perf-stat.ps.instructions 2.464e+13 +1.6% 2.504e+13 perf-stat.total.instructions 29.29 ± 2% -29.3 0.00 perf-profile.calltrace.cycles-pp.futex_wait_setup.futex_wait.do_futex.__x64_sys_futex.do_syscall_64 12.17 ± 2% -12.2 0.00 perf-profile.calltrace.cycles-pp.futex_q_lock.futex_wait_setup.futex_wait.do_futex.__x64_sys_futex 9.21 ± 2% -9.2 0.00 perf-profile.calltrace.cycles-pp.futex_get_value_locked.futex_wait_setup.futex_wait.do_futex.__x64_sys_futex 6.61 ± 2% -6.6 0.00 perf-profile.calltrace.cycles-pp.__get_user_nocheck_4.futex_get_value_locked.futex_wait_setup.futex_wait.do_futex 2.03 ± 2% -0.1 1.88 ± 3% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 0.00 +2.0 1.98 ± 4% perf-profile.calltrace.cycles-pp.get_futex_key.futex_wait_setup.__futex_wait.futex_wait.do_futex 0.00 +4.0 3.96 ± 3% perf-profile.calltrace.cycles-pp.futex_q_unlock.futex_wait_setup.__futex_wait.futex_wait.do_futex 0.00 +4.1 4.09 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait 0.00 +4.4 4.35 ± 3% perf-profile.calltrace.cycles-pp.futex_hash.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait 0.00 +6.1 6.14 ± 3% perf-profile.calltrace.cycles-pp.__get_user_nocheck_4.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait 0.00 +8.5 8.52 ± 3% perf-profile.calltrace.cycles-pp.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait.do_futex 0.00 +11.3 11.27 ± 3% perf-profile.calltrace.cycles-pp.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait.do_futex 0.00 +27.4 27.44 ± 3% perf-profile.calltrace.cycles-pp.futex_wait_setup.__futex_wait.futex_wait.do_futex.__x64_sys_futex 0.00 +31.3 31.33 ± 3% perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64 29.80 ± 2% -1.9 27.91 ± 3% perf-profile.children.cycles-pp.futex_wait_setup 12.68 ± 2% -0.9 11.74 ± 3% perf-profile.children.cycles-pp.futex_q_lock 7.49 ± 2% -0.6 6.93 ± 3% perf-profile.children.cycles-pp.__get_user_nocheck_4 4.38 ± 2% -0.4 3.96 ± 3% perf-profile.children.cycles-pp.futex_q_unlock 4.74 ± 2% -0.4 4.35 ± 3% perf-profile.children.cycles-pp.futex_hash 4.62 ± 2% -0.3 4.33 ± 3% perf-profile.children.cycles-pp._raw_spin_lock 0.48 ± 3% -0.2 0.32 ± 5% perf-profile.children.cycles-pp.futex_setup_timer 1.71 ± 2% -0.1 1.57 ± 4% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 1.24 ± 3% -0.1 1.14 ± 4% perf-profile.children.cycles-pp.syscall_enter_from_user_mode 0.52 ± 5% -0.0 0.47 ± 4% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare 0.35 ± 3% -0.0 0.31 ± 3% perf-profile.children.cycles-pp.syscall@plt 0.00 +31.5 31.46 ± 3% perf-profile.children.cycles-pp.__futex_wait 7.88 ± 2% -2.4 5.48 ± 2% perf-profile.self.cycles-pp.futex_wait 10.37 ± 3% -0.9 9.46 ± 3% perf-profile.self.cycles-pp.syscall 7.46 ± 2% -0.6 6.91 ± 3% perf-profile.self.cycles-pp.__get_user_nocheck_4 4.20 ± 2% -0.4 3.78 ± 3% perf-profile.self.cycles-pp.futex_q_unlock 4.56 ± 2% -0.4 4.19 ± 3% perf-profile.self.cycles-pp.futex_hash 4.44 ± 2% -0.3 4.16 ± 3% perf-profile.self.cycles-pp._raw_spin_lock 3.54 ± 2% -0.2 3.29 ± 3% perf-profile.self.cycles-pp.futex_q_lock 1.71 ± 2% -0.1 1.57 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 0.40 ± 3% -0.1 0.32 ± 5% perf-profile.self.cycles-pp.futex_setup_timer 1.18 -0.1 1.10 ± 3% perf-profile.self.cycles-pp.do_syscall_64 1.00 -0.1 0.94 ± 4% perf-profile.self.cycles-pp.syscall_exit_to_user_mode 2.14 ± 3% +0.2 2.31 ± 3% perf-profile.self.cycles-pp.__x64_sys_futex 0.00 +3.5 3.50 ± 3% perf-profile.self.cycles-pp.__futex_wait Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki