[linus:master] [mm] f77171d241: vm-scalability.throughput 34.9% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

kernel test robot noticed a 34.9% improvement of vm-scalability.throughput on:

commit: f77171d241e379ea93448a53d58104191e02135c ("mm: allow non-hugetlb large folios to be batch processed")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: vm-scalability
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:

	runtime: 300s
	test: truncate
	cpufreq_governor: performance

Details are as below:

The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240331/202403312219.c62301c9-yujie.liu@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/truncate/vm-scalability

commit: 
  31b2ff82ae ("mm: handle large folios in free_unref_folios()")
  f77171d241 ("mm: allow non-hugetlb large folios to be batch processed")

31b2ff82aefb33ce f77171d241e379ea93448a53d58 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 7.397e+08 ±  6%     +34.9%  9.978e+08 ±  3%  vm-scalability.median
 7.397e+08 ±  6%     +34.9%  9.978e+08 ±  3%  vm-scalability.throughput
    193.12 ±  7%     -16.4%     161.38 ±  3%  vm-scalability.time.percent_of_cpu_this_job_got
     84.58 ±  8%     -16.5%      70.62 ±  3%  vm-scalability.time.system_time
    154795 ± 85%    +168.7%     415963 ± 28%  numa-meminfo.node0.Inactive(anon)
  41174935 ± 36%     -81.1%    7801569 ± 30%  proc-vmstat.pgfree
     38644 ± 85%    +169.0%     103935 ± 28%  numa-vmstat.node0.nr_inactive_anon
     38644 ± 85%    +169.0%     103937 ± 28%  numa-vmstat.node0.nr_zone_inactive_anon
     18.05 ± 12%     -18.1        0.00        perf-profile.calltrace.cycles-pp.__folio_put_large.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat
     18.02 ± 12%     -18.0        0.00        perf-profile.calltrace.cycles-pp.__page_cache_release.__folio_put_large.folios_put_refs.truncate_inode_pages_range.evict
     17.68 ± 12%     -17.7        0.00        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.__folio_put_large.folios_put_refs.truncate_inode_pages_range
     17.63 ± 12%     -17.6        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.__folio_put_large.folios_put_refs
     17.57 ± 12%     -17.6        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.__folio_put_large
     22.14 ± 12%      -5.9       16.22 ±  8%  perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64
     22.15 ± 12%      -5.9       16.23 ±  8%  perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe
     22.16 ± 12%      -5.9       16.24 ±  8%  perf-profile.calltrace.cycles-pp.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
     22.16 ± 12%      -5.9       16.24 ±  8%  perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
     22.16 ± 12%      -5.9       16.24 ±  8%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
     22.16 ± 12%      -5.9       16.24 ±  8%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlinkat
     22.16 ± 12%      -5.9       16.24 ±  8%  perf-profile.calltrace.cycles-pp.unlinkat
     21.78 ± 12%      -5.7       16.05 ±  8%  perf-profile.calltrace.cycles-pp.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat
      1.14 ±  9%      +0.1        1.29 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_trylock.rebalance_domains.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt
      1.98 ±  3%      +0.2        2.17 ±  2%  perf-profile.calltrace.cycles-pp.rebalance_domains.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      2.24 ±  3%      +0.2        2.44 ±  4%  perf-profile.calltrace.cycles-pp.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.calltrace.cycles-pp.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.calltrace.cycles-pp.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.calltrace.cycles-pp.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.calltrace.cycles-pp.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.calltrace.cycles-pp.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.calltrace.cycles-pp.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail
      2.34 ±  3%      +0.2        2.56 ±  4%  perf-profile.calltrace.cycles-pp.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread.ret_from_fork
      2.34 ±  3%      +0.2        2.56 ±  4%  perf-profile.calltrace.cycles-pp.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread
      2.41 ±  3%      +0.2        2.64 ±  4%  perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      2.38 ±  3%      +0.2        2.61 ±  4%  perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      2.84 ±  4%      +0.2        3.09 ±  2%  perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
      6.56 ±  2%      +0.6        7.18 ±  3%  perf-profile.calltrace.cycles-pp.rep_movs_alternative._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read
      6.90 ±  2%      +0.7        7.55 ±  3%  perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter
      6.98 ±  2%      +0.7        7.64 ±  3%  perf-profile.calltrace.cycles-pp.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read
     14.15 ±  3%      +1.3       15.48        perf-profile.calltrace.cycles-pp.memset_orig.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages
     14.19 ±  3%      +1.3       15.53        perf-profile.calltrace.cycles-pp.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order
     14.30 ±  3%      +1.3       15.64        perf-profile.calltrace.cycles-pp.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order.filemap_get_pages
     14.36 ±  3%      +1.4       15.72        perf-profile.calltrace.cycles-pp.iomap_readahead.read_pages.page_cache_ra_order.filemap_get_pages.filemap_read
     14.37 ±  3%      +1.4       15.73        perf-profile.calltrace.cycles-pp.read_pages.page_cache_ra_order.filemap_get_pages.filemap_read.xfs_file_buffered_read
     14.81 ±  3%      +1.4       16.22        perf-profile.calltrace.cycles-pp.page_cache_ra_order.filemap_get_pages.filemap_read.xfs_file_buffered_read.xfs_file_read_iter
     14.86 ±  3%      +1.4       16.28        perf-profile.calltrace.cycles-pp.filemap_get_pages.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read
     21.90 ±  3%      +2.1       23.98        perf-profile.calltrace.cycles-pp.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read.ksys_read
     21.92 ±  3%      +2.1       24.01        perf-profile.calltrace.cycles-pp.xfs_file_buffered_read.xfs_file_read_iter.vfs_read.ksys_read.do_syscall_64
     21.94 ±  3%      +2.1       24.02        perf-profile.calltrace.cycles-pp.xfs_file_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
     22.08 ±  3%      +2.1       24.18        perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
     22.09 ±  3%      +2.1       24.20        perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
     22.11 ±  3%      +2.1       24.22        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
     22.15 ±  3%      +2.1       24.27        perf-profile.calltrace.cycles-pp.read
     22.11 ±  3%      +2.1       24.23        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
     45.34 ±  3%      +4.1       49.45 ±  2%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
     45.76 ±  3%      +4.1       49.89 ±  2%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
     45.87 ±  3%      +4.1       50.00 ±  2%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
     66.58 ±  3%      +5.8       72.37        perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
      0.00           +15.2       15.18 ±  8%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.free_one_page.free_unref_folios.folios_put_refs
      0.00           +15.3       15.26 ±  8%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.free_one_page.free_unref_folios.folios_put_refs.truncate_inode_pages_range
      0.00           +15.4       15.40 ±  8%  perf-profile.calltrace.cycles-pp.free_one_page.free_unref_folios.folios_put_refs.truncate_inode_pages_range.evict
      0.00           +15.8       15.85 ±  8%  perf-profile.calltrace.cycles-pp.free_unref_folios.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat
     18.06 ± 12%     -18.1        0.00        perf-profile.children.cycles-pp.__folio_put_large
     18.09 ± 12%     -17.9        0.16 ± 25%  perf-profile.children.cycles-pp.__page_cache_release
     17.78 ± 12%     -17.7        0.04 ±151%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
     22.15 ± 12%      -5.9       16.23 ±  8%  perf-profile.children.cycles-pp.evict
     22.14 ± 12%      -5.9       16.22 ±  8%  perf-profile.children.cycles-pp.truncate_inode_pages_range
     22.16 ± 12%      -5.9       16.24 ±  8%  perf-profile.children.cycles-pp.__x64_sys_unlinkat
     22.16 ± 12%      -5.9       16.24 ±  8%  perf-profile.children.cycles-pp.do_unlinkat
     22.16 ± 12%      -5.9       16.24 ±  8%  perf-profile.children.cycles-pp.unlinkat
     21.85 ± 12%      -5.8       16.07 ±  8%  perf-profile.children.cycles-pp.folios_put_refs
      0.26 ± 10%      -0.2        0.07 ± 12%  perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
      0.25 ± 13%      -0.2        0.06 ±  8%  perf-profile.children.cycles-pp.delete_from_page_cache_batch
      0.17 ±  6%      -0.1        0.08 ±  8%  perf-profile.children.cycles-pp.__mod_lruvec_state
      0.16 ±  7%      -0.1        0.08 ±  9%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.07 ±  9%      -0.0        0.03 ± 77%  perf-profile.children.cycles-pp.begin_new_exec
      0.14 ±  7%      -0.0        0.11 ±  4%  perf-profile.children.cycles-pp.__mmput
      0.14 ±  6%      -0.0        0.10 ±  4%  perf-profile.children.cycles-pp.exit_mmap
      0.07 ±  8%      -0.0        0.03 ± 78%  perf-profile.children.cycles-pp.folio_batch_move_lru
      0.14 ±  5%      -0.0        0.12 ±  5%  perf-profile.children.cycles-pp.load_elf_binary
      0.14 ±  3%      -0.0        0.12 ±  5%  perf-profile.children.cycles-pp.exec_binprm
      0.14 ±  3%      -0.0        0.12 ±  5%  perf-profile.children.cycles-pp.search_binary_handler
      0.17 ±  4%      -0.0        0.14 ±  4%  perf-profile.children.cycles-pp.bprm_execve
      0.09 ±  7%      +0.0        0.11 ±  9%  perf-profile.children.cycles-pp.__filemap_add_folio
      0.13 ±  8%      +0.0        0.16 ±  7%  perf-profile.children.cycles-pp.filemap_add_folio
      0.32 ±  3%      +0.0        0.35        perf-profile.children.cycles-pp.read_tsc
      0.27 ±  3%      +0.0        0.31 ±  4%  perf-profile.children.cycles-pp.rcu_core
      0.52 ±  3%      +0.0        0.56 ±  3%  perf-profile.children.cycles-pp.update_sg_lb_stats
      0.35 ±  5%      +0.0        0.39 ±  5%  perf-profile.children.cycles-pp.run_rebalance_domains
      0.00            +0.1        0.07 ±  9%  perf-profile.children.cycles-pp.free_tail_page_prepare
      1.20 ±  9%      +0.2        1.35 ±  2%  perf-profile.children.cycles-pp._raw_spin_trylock
      2.12 ±  2%      +0.2        2.32 ±  2%  perf-profile.children.cycles-pp.rebalance_domains
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.children.cycles-pp.ast_mode_config_helper_atomic_commit_tail
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.children.cycles-pp.commit_tail
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail_rpm
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.children.cycles-pp.drm_atomic_commit
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.children.cycles-pp.drm_atomic_helper_commit
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.children.cycles-pp.ast_primary_plane_helper_atomic_update
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.children.cycles-pp.drm_fb_memcpy
      2.27 ±  3%      +0.2        2.48 ±  4%  perf-profile.children.cycles-pp.memcpy_toio
      2.34 ±  3%      +0.2        2.56 ±  4%  perf-profile.children.cycles-pp.drm_fb_helper_damage_work
      2.34 ±  3%      +0.2        2.56 ±  4%  perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty
      2.41 ±  3%      +0.2        2.64 ±  4%  perf-profile.children.cycles-pp.worker_thread
      2.38 ±  3%      +0.2        2.61 ±  4%  perf-profile.children.cycles-pp.process_one_work
      3.12 ±  4%      +0.3        3.39 ±  2%  perf-profile.children.cycles-pp.__do_softirq
      3.50 ±  4%      +0.3        3.83 ±  4%  perf-profile.children.cycles-pp.irq_exit_rcu
      0.00            +0.4        0.38 ±  7%  perf-profile.children.cycles-pp.free_unref_page_prepare
      6.59 ±  2%      +0.6        7.21 ±  3%  perf-profile.children.cycles-pp.rep_movs_alternative
      6.94 ±  2%      +0.7        7.60 ±  3%  perf-profile.children.cycles-pp._copy_to_iter
      6.99 ±  2%      +0.7        7.65 ±  3%  perf-profile.children.cycles-pp.copy_page_to_iter
     14.17 ±  3%      +1.3       15.51        perf-profile.children.cycles-pp.memset_orig
     14.19 ±  3%      +1.3       15.53        perf-profile.children.cycles-pp.zero_user_segments
     14.30 ±  3%      +1.3       15.64        perf-profile.children.cycles-pp.iomap_readpage_iter
     14.36 ±  3%      +1.4       15.72        perf-profile.children.cycles-pp.iomap_readahead
     14.37 ±  3%      +1.4       15.73        perf-profile.children.cycles-pp.read_pages
     14.81 ±  3%      +1.4       16.22        perf-profile.children.cycles-pp.page_cache_ra_order
     14.86 ±  3%      +1.4       16.28        perf-profile.children.cycles-pp.filemap_get_pages
     21.90 ±  3%      +2.1       23.99        perf-profile.children.cycles-pp.filemap_read
     21.92 ±  3%      +2.1       24.01        perf-profile.children.cycles-pp.xfs_file_buffered_read
     21.94 ±  3%      +2.1       24.03        perf-profile.children.cycles-pp.xfs_file_read_iter
     22.09 ±  3%      +2.1       24.20        perf-profile.children.cycles-pp.vfs_read
     22.11 ±  3%      +2.1       24.22        perf-profile.children.cycles-pp.ksys_read
     22.18 ±  3%      +2.1       24.30        perf-profile.children.cycles-pp.read
     42.82 ±  3%      +3.8       46.65 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
     45.46 ±  3%      +4.1       49.57 ±  2%  perf-profile.children.cycles-pp.acpi_safe_halt
     45.57 ±  3%      +4.1       49.68 ±  2%  perf-profile.children.cycles-pp.acpi_idle_enter
     45.99 ±  3%      +4.1       50.12 ±  2%  perf-profile.children.cycles-pp.cpuidle_enter_state
     46.09 ±  3%      +4.1       50.22 ±  2%  perf-profile.children.cycles-pp.cpuidle_enter
      1.12 ± 19%     +14.3       15.41 ±  8%  perf-profile.children.cycles-pp.free_one_page
      0.00           +15.9       15.86 ±  8%  perf-profile.children.cycles-pp.free_unref_folios
      0.16 ±  7%      -0.1        0.08 ±  9%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.46 ±  3%      -0.0        0.42 ±  2%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.31 ±  4%      +0.0        0.35 ±  2%  perf-profile.self.cycles-pp.read_tsc
      0.40 ±  4%      +0.0        0.44 ±  5%  perf-profile.self.cycles-pp._copy_to_iter
      0.38 ±  2%      +0.0        0.43 ±  2%  perf-profile.self.cycles-pp.menu_select
      0.00            +0.1        0.05 ±  6%  perf-profile.self.cycles-pp.free_tail_page_prepare
      1.19 ±  9%      +0.2        1.34 ±  2%  perf-profile.self.cycles-pp._raw_spin_trylock
      2.26 ±  3%      +0.2        2.47 ±  4%  perf-profile.self.cycles-pp.memcpy_toio
      0.00            +0.3        0.33 ±  7%  perf-profile.self.cycles-pp.free_unref_page_prepare
      6.50 ±  2%      +0.6        7.11 ±  3%  perf-profile.self.cycles-pp.rep_movs_alternative
     14.09 ±  3%      +1.3       15.42        perf-profile.self.cycles-pp.memset_orig
     26.19 ±  4%      +2.1       28.29 ±  2%  perf-profile.self.cycles-pp.acpi_safe_halt


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux