Hello, kernel test robot noticed a -7.1% regression of vm-scalability.throughput on: commit: d2136d749d76af980b3accd72704eea4eab625bd ("mm: support multi-size THP numa balancing") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master [still regression on linus/master 92e5605a199efbaee59fb19e15d6cc2103a04ec2] testcase: vm-scalability test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory parameters: runtime: 300s size: 512G test: anon-cow-rand-hugetlb cpufreq_governor: performance If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202406201010.a1344783-oliver.sang@xxxxxxxxx Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240620/202406201010.a1344783-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/300s/512G/lkp-icl-2sp2/anon-cow-rand-hugetlb/vm-scalability commit: 6b0ed7b3c7 ("mm: factor out the numa mapping rebuilding into a new helper") d2136d749d ("mm: support multi-size THP numa balancing") 6b0ed7b3c77547d2 d2136d749d76af980b3accd7270 ---------------- --------------------------- %stddev %change %stddev \ | \ 12.02 -1.3 10.72 ± 4% mpstat.cpu.all.sys% 1228757 +3.0% 1265679 proc-vmstat.pgfault 7392513 -7.1% 6865649 vm-scalability.throughput 17356 +9.4% 18986 vm-scalability.time.user_time 0.32 ± 22% -36.9% 0.20 ± 17% sched_debug.cfs_rq:/.h_nr_running.stddev 28657 ± 86% -90.8% 2640 ± 19% sched_debug.cfs_rq:/.load.stddev 0.28 ± 35% -52.1% 0.13 ± 29% sched_debug.cfs_rq:/.nr_running.stddev 299.88 ± 27% -39.6% 181.04 ± 23% sched_debug.cfs_rq:/.runnable_avg.stddev 284.88 ± 32% -44.0% 159.65 ± 27% sched_debug.cfs_rq:/.util_avg.stddev 0.32 ± 22% -37.2% 0.20 ± 17% sched_debug.cpu.nr_running.stddev 1.584e+10 ± 2% -6.9% 1.476e+10 ± 3% perf-stat.i.branch-instructions 11673151 ± 3% -6.3% 10935072 ± 4% perf-stat.i.branch-misses 4.90 +3.5% 5.07 perf-stat.i.cpi 333.40 +7.5% 358.32 perf-stat.i.cycles-between-cache-misses 6.787e+10 ± 2% -6.8% 6.324e+10 ± 3% perf-stat.i.instructions 0.25 -6.2% 0.24 perf-stat.i.ipc 4.19 +7.5% 4.51 perf-stat.overall.cpi 323.02 +7.4% 346.94 perf-stat.overall.cycles-between-cache-misses 0.24 -7.0% 0.22 perf-stat.overall.ipc 1.549e+10 ± 2% -6.8% 1.444e+10 ± 3% perf-stat.ps.branch-instructions 6.634e+10 ± 2% -6.7% 6.186e+10 ± 3% perf-stat.ps.instructions 17.33 ± 77% -10.6 6.72 ±169% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access 17.30 ± 77% -10.6 6.71 ±169% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access 17.30 ± 77% -10.6 6.71 ±169% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access 17.28 ± 77% -10.6 6.70 ±169% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access 17.27 ± 77% -10.6 6.70 ±169% perf-profile.calltrace.cycles-pp.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 13.65 ± 76% -8.4 5.29 ±168% perf-profile.calltrace.cycles-pp.hugetlb_wp.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 13.37 ± 76% -8.2 5.18 ±168% perf-profile.calltrace.cycles-pp.copy_user_large_folio.hugetlb_wp.hugetlb_fault.handle_mm_fault.do_user_addr_fault 13.35 ± 76% -8.2 5.18 ±168% perf-profile.calltrace.cycles-pp.copy_subpage.copy_user_large_folio.hugetlb_wp.hugetlb_fault.handle_mm_fault 13.23 ± 76% -8.1 5.13 ±168% perf-profile.calltrace.cycles-pp.copy_mc_enhanced_fast_string.copy_subpage.copy_user_large_folio.hugetlb_wp.hugetlb_fault 3.59 ± 78% -2.2 1.39 ±169% perf-profile.calltrace.cycles-pp.__mutex_lock.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 17.35 ± 77% -10.6 6.73 ±169% perf-profile.children.cycles-pp.asm_exc_page_fault 17.32 ± 77% -10.6 6.72 ±168% perf-profile.children.cycles-pp.do_user_addr_fault 17.32 ± 77% -10.6 6.72 ±168% perf-profile.children.cycles-pp.exc_page_fault 17.30 ± 77% -10.6 6.71 ±168% perf-profile.children.cycles-pp.handle_mm_fault 17.28 ± 77% -10.6 6.70 ±169% perf-profile.children.cycles-pp.hugetlb_fault 13.65 ± 76% -8.4 5.29 ±168% perf-profile.children.cycles-pp.hugetlb_wp 13.37 ± 76% -8.2 5.18 ±168% perf-profile.children.cycles-pp.copy_user_large_folio 13.35 ± 76% -8.2 5.18 ±168% perf-profile.children.cycles-pp.copy_subpage 13.34 ± 76% -8.2 5.17 ±168% perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string 3.59 ± 78% -2.2 1.39 ±169% perf-profile.children.cycles-pp.__mutex_lock 13.24 ± 76% -8.1 5.13 ±168% perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki