Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> writes: > On 2024/6/20 10:39, kernel test robot wrote: >> Hello, >> kernel test robot noticed a -7.1% regression of >> vm-scalability.throughput on: >> commit: d2136d749d76af980b3accd72704eea4eab625bd ("mm: support >> multi-size THP numa balancing") >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master >> [still regression on linus/master >> 92e5605a199efbaee59fb19e15d6cc2103a04ec2] >> testcase: vm-scalability >> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory >> parameters: >> runtime: 300s >> size: 512G >> test: anon-cow-rand-hugetlb >> cpufreq_governor: performance > > Thanks for reporting. IIUC numa balancing will not scan hugetlb VMA, > I'm not sure how this patch affects the performance of hugetlb cow, > but let me try to reproduce it. > > >> If you fix the issue in a separate patch/commit (i.e. not just a new version of >> the same patch/commit), kindly add following tags >> | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> >> | Closes: https://lore.kernel.org/oe-lkp/202406201010.a1344783-oliver.sang@xxxxxxxxx >> Details are as below: >> --------------------------------------------------------------------------------------------------> >> The kernel config and materials to reproduce are available at: >> https://download.01.org/0day-ci/archive/20240620/202406201010.a1344783-oliver.sang@xxxxxxxxx >> ========================================================================================= >> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase: >> gcc-13/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/300s/512G/lkp-icl-2sp2/anon-cow-rand-hugetlb/vm-scalability >> commit: >> 6b0ed7b3c7 ("mm: factor out the numa mapping rebuilding into a new helper") >> d2136d749d ("mm: support multi-size THP numa balancing") >> 6b0ed7b3c77547d2 d2136d749d76af980b3accd7270 >> ---------------- --------------------------- >> %stddev %change %stddev >> \ | \ >> 12.02 -1.3 10.72 ± 4% mpstat.cpu.all.sys% >> 1228757 +3.0% 1265679 proc-vmstat.pgfault Also from other proc-vmstat stats, 21770 36% +6.1% 23098 28% proc-vmstat.numa_hint_faults 6168 107% +48.8% 9180 18% proc-vmstat.numa_hint_faults_local 154537 15% +23.5% 190883 17% proc-vmstat.numa_pte_updates After your patch, more hint page faults occurs, I think this is expected. Then, tasks may be moved between sockets because of that, so that some hugetlb page access becomes remote? >> 7392513 -7.1% 6865649 vm-scalability.throughput >> 17356 +9.4% 18986 vm-scalability.time.user_time >> 0.32 ± 22% -36.9% 0.20 ± 17% sched_debug.cfs_rq:/.h_nr_running.stddev >> 28657 ± 86% -90.8% 2640 ± 19% sched_debug.cfs_rq:/.load.stddev >> 0.28 ± 35% -52.1% 0.13 ± 29% sched_debug.cfs_rq:/.nr_running.stddev >> 299.88 ± 27% -39.6% 181.04 ± 23% sched_debug.cfs_rq:/.runnable_avg.stddev >> 284.88 ± 32% -44.0% 159.65 ± 27% sched_debug.cfs_rq:/.util_avg.stddev >> 0.32 ± 22% -37.2% 0.20 ± 17% sched_debug.cpu.nr_running.stddev >> 1.584e+10 ± 2% -6.9% 1.476e+10 ± 3% perf-stat.i.branch-instructions >> 11673151 ± 3% -6.3% 10935072 ± 4% perf-stat.i.branch-misses >> 4.90 +3.5% 5.07 perf-stat.i.cpi >> 333.40 +7.5% 358.32 perf-stat.i.cycles-between-cache-misses >> 6.787e+10 ± 2% -6.8% 6.324e+10 ± 3% perf-stat.i.instructions >> 0.25 -6.2% 0.24 perf-stat.i.ipc >> 4.19 +7.5% 4.51 perf-stat.overall.cpi >> 323.02 +7.4% 346.94 perf-stat.overall.cycles-between-cache-misses >> 0.24 -7.0% 0.22 perf-stat.overall.ipc >> 1.549e+10 ± 2% -6.8% 1.444e+10 ± 3% perf-stat.ps.branch-instructions >> 6.634e+10 ± 2% -6.7% 6.186e+10 ± 3% perf-stat.ps.instructions >> 17.33 ± 77% -10.6 6.72 ±169% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access >> 17.30 ± 77% -10.6 6.71 ±169% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access >> 17.30 ± 77% -10.6 6.71 ±169% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access >> 17.28 ± 77% -10.6 6.70 ±169% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access >> 17.27 ± 77% -10.6 6.70 ±169% perf-profile.calltrace.cycles-pp.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault >> 13.65 ± 76% -8.4 5.29 ±168% perf-profile.calltrace.cycles-pp.hugetlb_wp.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault >> 13.37 ± 76% -8.2 5.18 ±168% perf-profile.calltrace.cycles-pp.copy_user_large_folio.hugetlb_wp.hugetlb_fault.handle_mm_fault.do_user_addr_fault >> 13.35 ± 76% -8.2 5.18 ±168% perf-profile.calltrace.cycles-pp.copy_subpage.copy_user_large_folio.hugetlb_wp.hugetlb_fault.handle_mm_fault >> 13.23 ± 76% -8.1 5.13 ±168% perf-profile.calltrace.cycles-pp.copy_mc_enhanced_fast_string.copy_subpage.copy_user_large_folio.hugetlb_wp.hugetlb_fault >> 3.59 ± 78% -2.2 1.39 ±169% perf-profile.calltrace.cycles-pp.__mutex_lock.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault >> 17.35 ± 77% -10.6 6.73 ±169% perf-profile.children.cycles-pp.asm_exc_page_fault >> 17.32 ± 77% -10.6 6.72 ±168% perf-profile.children.cycles-pp.do_user_addr_fault >> 17.32 ± 77% -10.6 6.72 ±168% perf-profile.children.cycles-pp.exc_page_fault >> 17.30 ± 77% -10.6 6.71 ±168% perf-profile.children.cycles-pp.handle_mm_fault >> 17.28 ± 77% -10.6 6.70 ±169% perf-profile.children.cycles-pp.hugetlb_fault >> 13.65 ± 76% -8.4 5.29 ±168% perf-profile.children.cycles-pp.hugetlb_wp >> 13.37 ± 76% -8.2 5.18 ±168% perf-profile.children.cycles-pp.copy_user_large_folio >> 13.35 ± 76% -8.2 5.18 ±168% perf-profile.children.cycles-pp.copy_subpage >> 13.34 ± 76% -8.2 5.17 ±168% perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string >> 3.59 ± 78% -2.2 1.39 ±169% perf-profile.children.cycles-pp.__mutex_lock >> 13.24 ± 76% -8.1 5.13 ±168% perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string >> Disclaimer: >> Results have been estimated based on internal Intel analysis and are provided >> for informational purposes only. Any difference in system hardware or software >> design or configuration may affect actual performance. >> -- Best Regards, Huang, Ying