Hello, kernel test robot noticed a 34.9% improvement of vm-scalability.throughput on: commit: f77171d241e379ea93448a53d58104191e02135c ("mm: allow non-hugetlb large folios to be batch processed") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: vm-scalability test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory parameters: runtime: 300s test: truncate cpufreq_governor: performance Details are as below: The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240331/202403312219.c62301c9-yujie.liu@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/truncate/vm-scalability commit: 31b2ff82ae ("mm: handle large folios in free_unref_folios()") f77171d241 ("mm: allow non-hugetlb large folios to be batch processed") 31b2ff82aefb33ce f77171d241e379ea93448a53d58 ---------------- --------------------------- %stddev %change %stddev \ | \ 7.397e+08 ± 6% +34.9% 9.978e+08 ± 3% vm-scalability.median 7.397e+08 ± 6% +34.9% 9.978e+08 ± 3% vm-scalability.throughput 193.12 ± 7% -16.4% 161.38 ± 3% vm-scalability.time.percent_of_cpu_this_job_got 84.58 ± 8% -16.5% 70.62 ± 3% vm-scalability.time.system_time 154795 ± 85% +168.7% 415963 ± 28% numa-meminfo.node0.Inactive(anon) 41174935 ± 36% -81.1% 7801569 ± 30% proc-vmstat.pgfree 38644 ± 85% +169.0% 103935 ± 28% numa-vmstat.node0.nr_inactive_anon 38644 ± 85% +169.0% 103937 ± 28% numa-vmstat.node0.nr_zone_inactive_anon 18.05 ± 12% -18.1 0.00 perf-profile.calltrace.cycles-pp.__folio_put_large.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat 18.02 ± 12% -18.0 0.00 perf-profile.calltrace.cycles-pp.__page_cache_release.__folio_put_large.folios_put_refs.truncate_inode_pages_range.evict 17.68 ± 12% -17.7 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.__folio_put_large.folios_put_refs.truncate_inode_pages_range 17.63 ± 12% -17.6 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.__folio_put_large.folios_put_refs 17.57 ± 12% -17.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.__folio_put_large 22.14 ± 12% -5.9 16.22 ± 8% perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64 22.15 ± 12% -5.9 16.23 ± 8% perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe 22.16 ± 12% -5.9 16.24 ± 8% perf-profile.calltrace.cycles-pp.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat 22.16 ± 12% -5.9 16.24 ± 8% perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat 22.16 ± 12% -5.9 16.24 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat 22.16 ± 12% -5.9 16.24 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlinkat 22.16 ± 12% -5.9 16.24 ± 8% perf-profile.calltrace.cycles-pp.unlinkat 21.78 ± 12% -5.7 16.05 ± 8% perf-profile.calltrace.cycles-pp.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat 1.14 ± 9% +0.1 1.29 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_trylock.rebalance_domains.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt 1.98 ± 3% +0.2 2.17 ± 2% perf-profile.calltrace.cycles-pp.rebalance_domains.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt 2.24 ± 3% +0.2 2.44 ± 4% perf-profile.calltrace.cycles-pp.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.calltrace.cycles-pp.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail 2.34 ± 3% +0.2 2.56 ± 4% perf-profile.calltrace.cycles-pp.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread.ret_from_fork 2.34 ± 3% +0.2 2.56 ± 4% perf-profile.calltrace.cycles-pp.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread 2.41 ± 3% +0.2 2.64 ± 4% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 2.38 ± 3% +0.2 2.61 ± 4% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 2.84 ± 4% +0.2 3.09 ± 2% perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt 6.56 ± 2% +0.6 7.18 ± 3% perf-profile.calltrace.cycles-pp.rep_movs_alternative._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read 6.90 ± 2% +0.7 7.55 ± 3% perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter 6.98 ± 2% +0.7 7.64 ± 3% perf-profile.calltrace.cycles-pp.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read 14.15 ± 3% +1.3 15.48 perf-profile.calltrace.cycles-pp.memset_orig.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages 14.19 ± 3% +1.3 15.53 perf-profile.calltrace.cycles-pp.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order 14.30 ± 3% +1.3 15.64 perf-profile.calltrace.cycles-pp.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order.filemap_get_pages 14.36 ± 3% +1.4 15.72 perf-profile.calltrace.cycles-pp.iomap_readahead.read_pages.page_cache_ra_order.filemap_get_pages.filemap_read 14.37 ± 3% +1.4 15.73 perf-profile.calltrace.cycles-pp.read_pages.page_cache_ra_order.filemap_get_pages.filemap_read.xfs_file_buffered_read 14.81 ± 3% +1.4 16.22 perf-profile.calltrace.cycles-pp.page_cache_ra_order.filemap_get_pages.filemap_read.xfs_file_buffered_read.xfs_file_read_iter 14.86 ± 3% +1.4 16.28 perf-profile.calltrace.cycles-pp.filemap_get_pages.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read 21.90 ± 3% +2.1 23.98 perf-profile.calltrace.cycles-pp.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read.ksys_read 21.92 ± 3% +2.1 24.01 perf-profile.calltrace.cycles-pp.xfs_file_buffered_read.xfs_file_read_iter.vfs_read.ksys_read.do_syscall_64 21.94 ± 3% +2.1 24.02 perf-profile.calltrace.cycles-pp.xfs_file_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 22.08 ± 3% +2.1 24.18 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 22.09 ± 3% +2.1 24.20 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 22.11 ± 3% +2.1 24.22 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 22.15 ± 3% +2.1 24.27 perf-profile.calltrace.cycles-pp.read 22.11 ± 3% +2.1 24.23 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read 45.34 ± 3% +4.1 49.45 ± 2% perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 45.76 ± 3% +4.1 49.89 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 45.87 ± 3% +4.1 50.00 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 66.58 ± 3% +5.8 72.37 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter 0.00 +15.2 15.18 ± 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.free_one_page.free_unref_folios.folios_put_refs 0.00 +15.3 15.26 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.free_one_page.free_unref_folios.folios_put_refs.truncate_inode_pages_range 0.00 +15.4 15.40 ± 8% perf-profile.calltrace.cycles-pp.free_one_page.free_unref_folios.folios_put_refs.truncate_inode_pages_range.evict 0.00 +15.8 15.85 ± 8% perf-profile.calltrace.cycles-pp.free_unref_folios.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat 18.06 ± 12% -18.1 0.00 perf-profile.children.cycles-pp.__folio_put_large 18.09 ± 12% -17.9 0.16 ± 25% perf-profile.children.cycles-pp.__page_cache_release 17.78 ± 12% -17.7 0.04 ±151% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 22.15 ± 12% -5.9 16.23 ± 8% perf-profile.children.cycles-pp.evict 22.14 ± 12% -5.9 16.22 ± 8% perf-profile.children.cycles-pp.truncate_inode_pages_range 22.16 ± 12% -5.9 16.24 ± 8% perf-profile.children.cycles-pp.__x64_sys_unlinkat 22.16 ± 12% -5.9 16.24 ± 8% perf-profile.children.cycles-pp.do_unlinkat 22.16 ± 12% -5.9 16.24 ± 8% perf-profile.children.cycles-pp.unlinkat 21.85 ± 12% -5.8 16.07 ± 8% perf-profile.children.cycles-pp.folios_put_refs 0.26 ± 10% -0.2 0.07 ± 12% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio 0.25 ± 13% -0.2 0.06 ± 8% perf-profile.children.cycles-pp.delete_from_page_cache_batch 0.17 ± 6% -0.1 0.08 ± 8% perf-profile.children.cycles-pp.__mod_lruvec_state 0.16 ± 7% -0.1 0.08 ± 9% perf-profile.children.cycles-pp.__mod_node_page_state 0.07 ± 9% -0.0 0.03 ± 77% perf-profile.children.cycles-pp.begin_new_exec 0.14 ± 7% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.__mmput 0.14 ± 6% -0.0 0.10 ± 4% perf-profile.children.cycles-pp.exit_mmap 0.07 ± 8% -0.0 0.03 ± 78% perf-profile.children.cycles-pp.folio_batch_move_lru 0.14 ± 5% -0.0 0.12 ± 5% perf-profile.children.cycles-pp.load_elf_binary 0.14 ± 3% -0.0 0.12 ± 5% perf-profile.children.cycles-pp.exec_binprm 0.14 ± 3% -0.0 0.12 ± 5% perf-profile.children.cycles-pp.search_binary_handler 0.17 ± 4% -0.0 0.14 ± 4% perf-profile.children.cycles-pp.bprm_execve 0.09 ± 7% +0.0 0.11 ± 9% perf-profile.children.cycles-pp.__filemap_add_folio 0.13 ± 8% +0.0 0.16 ± 7% perf-profile.children.cycles-pp.filemap_add_folio 0.32 ± 3% +0.0 0.35 perf-profile.children.cycles-pp.read_tsc 0.27 ± 3% +0.0 0.31 ± 4% perf-profile.children.cycles-pp.rcu_core 0.52 ± 3% +0.0 0.56 ± 3% perf-profile.children.cycles-pp.update_sg_lb_stats 0.35 ± 5% +0.0 0.39 ± 5% perf-profile.children.cycles-pp.run_rebalance_domains 0.00 +0.1 0.07 ± 9% perf-profile.children.cycles-pp.free_tail_page_prepare 1.20 ± 9% +0.2 1.35 ± 2% perf-profile.children.cycles-pp._raw_spin_trylock 2.12 ± 2% +0.2 2.32 ± 2% perf-profile.children.cycles-pp.rebalance_domains 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.ast_mode_config_helper_atomic_commit_tail 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.commit_tail 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail_rpm 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.drm_atomic_commit 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.drm_atomic_helper_commit 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.ast_primary_plane_helper_atomic_update 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.drm_fb_memcpy 2.27 ± 3% +0.2 2.48 ± 4% perf-profile.children.cycles-pp.memcpy_toio 2.34 ± 3% +0.2 2.56 ± 4% perf-profile.children.cycles-pp.drm_fb_helper_damage_work 2.34 ± 3% +0.2 2.56 ± 4% perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty 2.41 ± 3% +0.2 2.64 ± 4% perf-profile.children.cycles-pp.worker_thread 2.38 ± 3% +0.2 2.61 ± 4% perf-profile.children.cycles-pp.process_one_work 3.12 ± 4% +0.3 3.39 ± 2% perf-profile.children.cycles-pp.__do_softirq 3.50 ± 4% +0.3 3.83 ± 4% perf-profile.children.cycles-pp.irq_exit_rcu 0.00 +0.4 0.38 ± 7% perf-profile.children.cycles-pp.free_unref_page_prepare 6.59 ± 2% +0.6 7.21 ± 3% perf-profile.children.cycles-pp.rep_movs_alternative 6.94 ± 2% +0.7 7.60 ± 3% perf-profile.children.cycles-pp._copy_to_iter 6.99 ± 2% +0.7 7.65 ± 3% perf-profile.children.cycles-pp.copy_page_to_iter 14.17 ± 3% +1.3 15.51 perf-profile.children.cycles-pp.memset_orig 14.19 ± 3% +1.3 15.53 perf-profile.children.cycles-pp.zero_user_segments 14.30 ± 3% +1.3 15.64 perf-profile.children.cycles-pp.iomap_readpage_iter 14.36 ± 3% +1.4 15.72 perf-profile.children.cycles-pp.iomap_readahead 14.37 ± 3% +1.4 15.73 perf-profile.children.cycles-pp.read_pages 14.81 ± 3% +1.4 16.22 perf-profile.children.cycles-pp.page_cache_ra_order 14.86 ± 3% +1.4 16.28 perf-profile.children.cycles-pp.filemap_get_pages 21.90 ± 3% +2.1 23.99 perf-profile.children.cycles-pp.filemap_read 21.92 ± 3% +2.1 24.01 perf-profile.children.cycles-pp.xfs_file_buffered_read 21.94 ± 3% +2.1 24.03 perf-profile.children.cycles-pp.xfs_file_read_iter 22.09 ± 3% +2.1 24.20 perf-profile.children.cycles-pp.vfs_read 22.11 ± 3% +2.1 24.22 perf-profile.children.cycles-pp.ksys_read 22.18 ± 3% +2.1 24.30 perf-profile.children.cycles-pp.read 42.82 ± 3% +3.8 46.65 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 45.46 ± 3% +4.1 49.57 ± 2% perf-profile.children.cycles-pp.acpi_safe_halt 45.57 ± 3% +4.1 49.68 ± 2% perf-profile.children.cycles-pp.acpi_idle_enter 45.99 ± 3% +4.1 50.12 ± 2% perf-profile.children.cycles-pp.cpuidle_enter_state 46.09 ± 3% +4.1 50.22 ± 2% perf-profile.children.cycles-pp.cpuidle_enter 1.12 ± 19% +14.3 15.41 ± 8% perf-profile.children.cycles-pp.free_one_page 0.00 +15.9 15.86 ± 8% perf-profile.children.cycles-pp.free_unref_folios 0.16 ± 7% -0.1 0.08 ± 9% perf-profile.self.cycles-pp.__mod_node_page_state 0.46 ± 3% -0.0 0.42 ± 2% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.31 ± 4% +0.0 0.35 ± 2% perf-profile.self.cycles-pp.read_tsc 0.40 ± 4% +0.0 0.44 ± 5% perf-profile.self.cycles-pp._copy_to_iter 0.38 ± 2% +0.0 0.43 ± 2% perf-profile.self.cycles-pp.menu_select 0.00 +0.1 0.05 ± 6% perf-profile.self.cycles-pp.free_tail_page_prepare 1.19 ± 9% +0.2 1.34 ± 2% perf-profile.self.cycles-pp._raw_spin_trylock 2.26 ± 3% +0.2 2.47 ± 4% perf-profile.self.cycles-pp.memcpy_toio 0.00 +0.3 0.33 ± 7% perf-profile.self.cycles-pp.free_unref_page_prepare 6.50 ± 2% +0.6 7.11 ± 3% perf-profile.self.cycles-pp.rep_movs_alternative 14.09 ± 3% +1.3 15.42 perf-profile.self.cycles-pp.memset_orig 26.19 ± 4% +2.1 28.29 ± 2% perf-profile.self.cycles-pp.acpi_safe_halt Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki