On 2024/6/20 10:39, kernel test robot wrote:
Hello, kernel test robot noticed a -7.1% regression of vm-scalability.throughput on: commit: d2136d749d76af980b3accd72704eea4eab625bd ("mm: support multi-size THP numa balancing") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master [still regression on linus/master 92e5605a199efbaee59fb19e15d6cc2103a04ec2] testcase: vm-scalability test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory parameters: runtime: 300s size: 512G test: anon-cow-rand-hugetlb cpufreq_governor: performance
Thanks for reporting. IIUC numa balancing will not scan hugetlb VMA, I'm not sure how this patch affects the performance of hugetlb cow, but let me try to reproduce it.
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202406201010.a1344783-oliver.sang@xxxxxxxxx Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240620/202406201010.a1344783-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/300s/512G/lkp-icl-2sp2/anon-cow-rand-hugetlb/vm-scalability commit: 6b0ed7b3c7 ("mm: factor out the numa mapping rebuilding into a new helper") d2136d749d ("mm: support multi-size THP numa balancing") 6b0ed7b3c77547d2 d2136d749d76af980b3accd7270 ---------------- --------------------------- %stddev %change %stddev \ | \ 12.02 -1.3 10.72 ± 4% mpstat.cpu.all.sys% 1228757 +3.0% 1265679 proc-vmstat.pgfault 7392513 -7.1% 6865649 vm-scalability.throughput 17356 +9.4% 18986 vm-scalability.time.user_time 0.32 ± 22% -36.9% 0.20 ± 17% sched_debug.cfs_rq:/.h_nr_running.stddev 28657 ± 86% -90.8% 2640 ± 19% sched_debug.cfs_rq:/.load.stddev 0.28 ± 35% -52.1% 0.13 ± 29% sched_debug.cfs_rq:/.nr_running.stddev 299.88 ± 27% -39.6% 181.04 ± 23% sched_debug.cfs_rq:/.runnable_avg.stddev 284.88 ± 32% -44.0% 159.65 ± 27% sched_debug.cfs_rq:/.util_avg.stddev 0.32 ± 22% -37.2% 0.20 ± 17% sched_debug.cpu.nr_running.stddev 1.584e+10 ± 2% -6.9% 1.476e+10 ± 3% perf-stat.i.branch-instructions 11673151 ± 3% -6.3% 10935072 ± 4% perf-stat.i.branch-misses 4.90 +3.5% 5.07 perf-stat.i.cpi 333.40 +7.5% 358.32 perf-stat.i.cycles-between-cache-misses 6.787e+10 ± 2% -6.8% 6.324e+10 ± 3% perf-stat.i.instructions 0.25 -6.2% 0.24 perf-stat.i.ipc 4.19 +7.5% 4.51 perf-stat.overall.cpi 323.02 +7.4% 346.94 perf-stat.overall.cycles-between-cache-misses 0.24 -7.0% 0.22 perf-stat.overall.ipc 1.549e+10 ± 2% -6.8% 1.444e+10 ± 3% perf-stat.ps.branch-instructions 6.634e+10 ± 2% -6.7% 6.186e+10 ± 3% perf-stat.ps.instructions 17.33 ± 77% -10.6 6.72 ±169% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access 17.30 ± 77% -10.6 6.71 ±169% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access 17.30 ± 77% -10.6 6.71 ±169% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access 17.28 ± 77% -10.6 6.70 ±169% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access 17.27 ± 77% -10.6 6.70 ±169% perf-profile.calltrace.cycles-pp.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 13.65 ± 76% -8.4 5.29 ±168% perf-profile.calltrace.cycles-pp.hugetlb_wp.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 13.37 ± 76% -8.2 5.18 ±168% perf-profile.calltrace.cycles-pp.copy_user_large_folio.hugetlb_wp.hugetlb_fault.handle_mm_fault.do_user_addr_fault 13.35 ± 76% -8.2 5.18 ±168% perf-profile.calltrace.cycles-pp.copy_subpage.copy_user_large_folio.hugetlb_wp.hugetlb_fault.handle_mm_fault 13.23 ± 76% -8.1 5.13 ±168% perf-profile.calltrace.cycles-pp.copy_mc_enhanced_fast_string.copy_subpage.copy_user_large_folio.hugetlb_wp.hugetlb_fault 3.59 ± 78% -2.2 1.39 ±169% perf-profile.calltrace.cycles-pp.__mutex_lock.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 17.35 ± 77% -10.6 6.73 ±169% perf-profile.children.cycles-pp.asm_exc_page_fault 17.32 ± 77% -10.6 6.72 ±168% perf-profile.children.cycles-pp.do_user_addr_fault 17.32 ± 77% -10.6 6.72 ±168% perf-profile.children.cycles-pp.exc_page_fault 17.30 ± 77% -10.6 6.71 ±168% perf-profile.children.cycles-pp.handle_mm_fault 17.28 ± 77% -10.6 6.70 ±169% perf-profile.children.cycles-pp.hugetlb_fault 13.65 ± 76% -8.4 5.29 ±168% perf-profile.children.cycles-pp.hugetlb_wp 13.37 ± 76% -8.2 5.18 ±168% perf-profile.children.cycles-pp.copy_user_large_folio 13.35 ± 76% -8.2 5.18 ±168% perf-profile.children.cycles-pp.copy_subpage 13.34 ± 76% -8.2 5.17 ±168% perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string 3.59 ± 78% -2.2 1.39 ±169% perf-profile.children.cycles-pp.__mutex_lock 13.24 ± 76% -8.1 5.13 ±168% perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.