On 3/12/22 16:43, kernel test robot wrote: > > > Greeting, > > FYI, we noticed a 30.5% improvement of vm-scalability.throughput due to commit: > > > commit: 8212a964ee020471104e34dce7029dec33c218a9 ("Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held") > url: https://github.com/0day-ci/linux/commits/Mel-Gorman/Re-PATCH-v2-mm-page_alloc-call-check_new_pages-while-zone-spinlock-is-not-held/20220309-203504 > patch link: https://lore.kernel.org/lkml/20220309123245.GI15701@xxxxxxxxxxxxxxxxxxx Heh, that's weird. I would expect some improvement from Eric's patch, but this seems to be actually about Mel's "mm/page_alloc: check high-order pages for corruption during PCP operations" applied directly on 5.17-rc7 per the github url above. This was rather expected to make performance worse if anything, so maybe the improvement is due to some unexpected side-effect of different inlining decisions or cache alignment... > in testcase: vm-scalability > on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz with 128G memory > with following parameters: > > runtime: 300s > size: 512G > test: anon-w-rand-hugetlb > cpufreq_governor: performance > ucode: 0xd000331 > > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us. > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/ > > > > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > To reproduce: > > git clone https://github.com/intel/lkp-tests.git > cd lkp-tests > sudo bin/lkp install job.yaml # job file is attached in this email > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run > sudo bin/lkp run generated-yaml-file > > # if come across any failure that blocks the test, > # please remove ~/.lkp and /lkp dir to run from a clean state. > > ========================================================================================= > compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode: > gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/512G/lkp-icl-2sp5/anon-w-rand-hugetlb/vm-scalability/0xd000331 > > commit: > v5.17-rc7 > 8212a964ee ("mm/page_alloc: call check_new_pages() while zone spinlock is not held") > > v5.17-rc7 8212a964ee020471104e34dce70 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 0.00 ± 5% -7.4% 0.00 ± 4% vm-scalability.free_time > 47190 ± 2% +25.5% 59208 ± 2% vm-scalability.median > 6352467 ± 2% +30.5% 8293110 ± 2% vm-scalability.throughput > 218.97 ± 2% -18.7% 177.98 ± 3% vm-scalability.time.elapsed_time > 218.97 ± 2% -18.7% 177.98 ± 3% vm-scalability.time.elapsed_time.max > 121357 ± 7% -24.9% 91162 ± 10% vm-scalability.time.involuntary_context_switches > 11226 -5.2% 10641 vm-scalability.time.percent_of_cpu_this_job_got > 2311 ± 3% -35.2% 1496 ± 6% vm-scalability.time.system_time > 22275 ± 2% -21.7% 17443 ± 3% vm-scalability.time.user_time > 9358 ± 3% -13.1% 8130 vm-scalability.time.voluntary_context_switches > 255.23 -16.1% 214.10 ± 2% uptime.boot > 2593 +6.8% 2771 ± 5% vmstat.system.cs > 11.51 ± 7% +4.5 16.05 ± 8% mpstat.cpu.all.idle% > 8.48 ± 2% -1.6 6.84 ± 3% mpstat.cpu.all.sys% > 727581 ± 12% -17.2% 602238 ± 6% numa-numastat.node1.local_node > 798037 ± 8% -13.3% 691955 ± 6% numa-numastat.node1.numa_hit > 5806206 ± 17% +26.7% 7356010 ± 10% turbostat.C1E > 9.55 ± 26% +5.9 15.48 ± 9% turbostat.C1E% > 59854751 ± 2% -17.8% 49202950 ± 3% turbostat.IRQ > 42804 ± 6% -54.9% 19301 ± 21% meminfo.Active > 41832 ± 7% -56.2% 18325 ± 23% meminfo.Active(anon) > 63386 ± 6% -26.6% 46542 ± 3% meminfo.Mapped > 137758 -25.5% 102591 ± 3% meminfo.Shmem > 36980 ± 5% -62.6% 13823 ± 29% numa-meminfo.node1.Active > 36495 ± 5% -63.9% 13173 ± 30% numa-meminfo.node1.Active(anon) > 19454 ± 26% -57.7% 8233 ± 33% numa-meminfo.node1.Mapped > 65896 ± 38% -67.8% 21189 ± 13% numa-meminfo.node1.Shmem > 9185 ± 6% -64.7% 3246 ± 31% numa-vmstat.node1.nr_active_anon > 4769 ± 26% -54.5% 2171 ± 32% numa-vmstat.node1.nr_mapped > 16462 ± 37% -68.1% 5258 ± 14% numa-vmstat.node1.nr_shmem > 9185 ± 6% -64.7% 3246 ± 31% numa-vmstat.node1.nr_zone_active_anon > 10436 ± 5% -56.2% 4570 ± 23% proc-vmstat.nr_active_anon > 69290 +1.3% 70203 proc-vmstat.nr_anon_pages > 1717695 +4.5% 1794462 proc-vmstat.nr_dirty_background_threshold > 3439592 +4.5% 3593312 proc-vmstat.nr_dirty_threshold > 640952 -1.4% 632171 proc-vmstat.nr_file_pages > 17356030 +4.4% 18125242 proc-vmstat.nr_free_pages > 93258 -2.4% 91059 proc-vmstat.nr_inactive_anon > 16187 ± 5% -26.4% 11911 ± 2% proc-vmstat.nr_mapped > 34477 ± 2% -25.6% 25663 ± 4% proc-vmstat.nr_shmem > 10436 ± 5% -56.2% 4570 ± 23% proc-vmstat.nr_zone_active_anon > 93258 -2.4% 91059 proc-vmstat.nr_zone_inactive_anon > 32151 ± 16% -61.0% 12542 ± 13% proc-vmstat.numa_hint_faults > 21214 ± 22% -86.0% 2964 ± 45% proc-vmstat.numa_hint_faults_local > 1598135 -10.9% 1423466 proc-vmstat.numa_hit > 1481881 -11.8% 1307551 proc-vmstat.numa_local > 117279 -1.2% 115916 proc-vmstat.numa_other > 555445 ± 16% -53.2% 260178 ± 53% proc-vmstat.numa_pte_updates > 93889 ± 4% -74.3% 24113 ± 7% proc-vmstat.pgactivate > 1599893 -11.0% 1424527 proc-vmstat.pgalloc_normal > 1594626 -14.2% 1368920 proc-vmstat.pgfault > 1609987 -20.8% 1275284 proc-vmstat.pgfree > 49893 -14.8% 42496 ± 5% proc-vmstat.pgreuse > 15.23 ± 2% -7.8% 14.04 perf-stat.i.MPKI > 1.348e+10 +22.0% 1.645e+10 ± 3% perf-stat.i.branch-instructions > 6.957e+08 ± 2% +22.4% 8.517e+08 ± 3% perf-stat.i.cache-misses > 7.117e+08 ± 2% +22.4% 8.71e+08 ± 3% perf-stat.i.cache-references > 7.86 ± 2% -29.0% 5.58 ± 6% perf-stat.i.cpi > 3.739e+11 -5.1% 3.549e+11 perf-stat.i.cpu-cycles > 550.18 ± 3% -22.2% 427.87 ± 5% perf-stat.i.cycles-between-cache-misses > 1.605e+10 +22.1% 1.959e+10 ± 3% perf-stat.i.dTLB-loads > 0.02 ± 3% -0.0 0.01 ± 4% perf-stat.i.dTLB-store-miss-rate% > 921125 ± 2% -4.6% 878569 perf-stat.i.dTLB-store-misses > 5.803e+09 +22.0% 7.078e+09 ± 3% perf-stat.i.dTLB-stores > 5.665e+10 +22.0% 6.911e+10 ± 3% perf-stat.i.instructions > 0.16 ± 3% +26.1% 0.20 ± 3% perf-stat.i.ipc > 2.92 -5.1% 2.77 perf-stat.i.metric.GHz > 123.32 ± 16% +158.4% 318.61 ± 22% perf-stat.i.metric.K/sec > 286.92 +21.8% 349.59 ± 3% perf-stat.i.metric.M/sec > 6641 +4.8% 6957 ± 2% perf-stat.i.minor-faults > 586608 ± 12% +36.4% 800024 ± 7% perf-stat.i.node-loads > 26.79 ± 4% -10.5 16.31 ± 12% perf-stat.i.node-store-miss-rate% > 1.785e+08 ± 2% -27.7% 1.291e+08 ± 7% perf-stat.i.node-store-misses > 5.131e+08 ± 3% +39.8% 7.172e+08 ± 5% perf-stat.i.node-stores > 6643 +4.8% 6959 ± 2% perf-stat.i.page-faults > 0.02 ± 18% -0.0 0.01 ± 4% perf-stat.overall.branch-miss-rate% > 6.66 ± 2% -22.5% 5.16 ± 3% perf-stat.overall.cpi > 539.35 ± 2% -22.7% 416.69 ± 3% perf-stat.overall.cycles-between-cache-misses > 0.02 ± 3% -0.0 0.01 ± 3% perf-stat.overall.dTLB-store-miss-rate% > 0.15 ± 2% +29.1% 0.19 ± 3% perf-stat.overall.ipc > 25.88 ± 4% -10.6 15.28 ± 10% perf-stat.overall.node-store-miss-rate% > 1.325e+10 ± 2% +22.3% 1.622e+10 ± 3% perf-stat.ps.branch-instructions > 6.88e+08 ± 2% +22.7% 8.444e+08 ± 3% perf-stat.ps.cache-misses > 7.043e+08 ± 2% +22.7% 8.638e+08 ± 3% perf-stat.ps.cache-references > 3.708e+11 -5.2% 3.515e+11 perf-stat.ps.cpu-cycles > 1.577e+10 ± 2% +22.4% 1.931e+10 ± 3% perf-stat.ps.dTLB-loads > 910623 ± 2% -4.6% 868700 perf-stat.ps.dTLB-store-misses > 5.701e+09 ± 2% +22.3% 6.975e+09 ± 3% perf-stat.ps.dTLB-stores > 5.569e+10 ± 2% +22.3% 6.813e+10 ± 3% perf-stat.ps.instructions > 6716 +4.8% 7038 perf-stat.ps.minor-faults > 595302 ± 11% +37.2% 816710 ± 8% perf-stat.ps.node-loads > 1.769e+08 ± 2% -27.8% 1.277e+08 ± 7% perf-stat.ps.node-store-misses > 5.071e+08 ± 3% +40.3% 7.113e+08 ± 5% perf-stat.ps.node-stores > 6717 +4.8% 7039 perf-stat.ps.page-faults > 0.00 +0.8 0.80 ± 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.rmqueue_bulk.get_page_from_freelist.__alloc_pages > 0.00 +0.8 0.80 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.rmqueue_bulk.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page > 0.00 +0.8 0.83 ± 8% perf-profile.calltrace.cycles-pp.rmqueue_bulk.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page > 0.00 +0.8 0.84 ± 8% perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory > 0.00 +0.8 0.84 ± 8% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page > 0.00 +0.8 0.84 ± 8% perf-profile.calltrace.cycles-pp.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages > 0.00 +0.9 0.85 ± 8% perf-profile.calltrace.cycles-pp.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap > 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region > 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap > 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap > 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap > 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64 > 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap > 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.__mmap > 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.hugetlbfs_file_mmap.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff > 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region.do_mmap.vm_mmap_pgoff > 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region.do_mmap > 60.28 ± 5% +4.7 64.98 ± 2% perf-profile.calltrace.cycles-pp.do_rw_once > 0.09 ± 8% +0.0 0.11 ± 9% perf-profile.children.cycles-pp.task_tick_fair > 0.14 ± 7% +0.0 0.17 ± 5% perf-profile.children.cycles-pp.scheduler_tick > 0.20 ± 9% +0.0 0.24 ± 3% perf-profile.children.cycles-pp.tick_sched_timer > 0.19 ± 9% +0.0 0.24 ± 4% perf-profile.children.cycles-pp.tick_sched_handle > 0.19 ± 9% +0.0 0.23 ± 4% perf-profile.children.cycles-pp.update_process_times > 0.24 ± 8% +0.0 0.29 ± 3% perf-profile.children.cycles-pp.__hrtimer_run_queues > 0.40 ± 8% +0.1 0.45 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt > 0.39 ± 7% +0.1 0.45 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt > 0.26 ± 71% +0.6 0.86 ± 8% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath > 0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.__mmap > 0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.ksys_mmap_pgoff > 0.27 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.hugetlbfs_file_mmap > 0.27 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.hugetlb_reserve_pages > 0.27 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.hugetlb_acct_memory > 0.27 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.alloc_surplus_huge_page > 0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.vm_mmap_pgoff > 0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.do_mmap > 0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.mmap_region > 0.55 ± 44% +0.6 1.16 ± 9% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe > 0.55 ± 44% +0.6 1.16 ± 9% perf-profile.children.cycles-pp.do_syscall_64 > 0.12 ± 71% +0.7 0.85 ± 8% perf-profile.children.cycles-pp.alloc_fresh_huge_page > 0.03 ± 70% +0.8 0.84 ± 8% perf-profile.children.cycles-pp.alloc_buddy_huge_page > 0.04 ± 71% +0.8 0.84 ± 8% perf-profile.children.cycles-pp.get_page_from_freelist > 0.04 ± 71% +0.8 0.84 ± 8% perf-profile.children.cycles-pp.__alloc_pages > 0.00 +0.8 0.82 ± 8% perf-profile.children.cycles-pp._raw_spin_lock > 0.00 +0.8 0.83 ± 8% perf-profile.children.cycles-pp.rmqueue_bulk > 0.26 ± 71% +0.6 0.86 ± 8% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath > > > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > > --- > 0-DAY CI Kernel Test Service > https://lists.01.org/hyperkitty/list/lkp@xxxxxxxxxxxx > > Thanks, > Oliver Sang >