[linus:master] [mm] 7c33b8c422: will-it-scale.per_thread_ops 11.4% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 11.4% improvement of will-it-scale.per_thread_ops on:


commit: 7c33b8c4229af19797c78de48827ca70228c1f47 ("mm: remove use of folio list from folios_put()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: will-it-scale
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:

	nr_task: 16
	mode: thread
	test: page_fault2
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240325/202403251657.f8cbfe2a-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale

commit: 
  4882c80975 ("memcg: add mem_cgroup_uncharge_folios()")
  7c33b8c422 ("mm: remove use of folio list from folios_put()")

4882c80975e2bf72 7c33b8c4229af19797c78de4882 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      9.19            +1.1       10.32        mpstat.cpu.all.sys%
      2.00            +0.2        2.23        mpstat.cpu.all.usr%
 1.171e+09           +11.0%    1.3e+09        numa-numastat.node0.local_node
 1.173e+09           +10.9%  1.301e+09        numa-numastat.node0.numa_hit
     12.80           +10.6%      14.16        vmstat.procs.r
      2179            +2.7%       2239        vmstat.system.cs
     28134           +10.1%      30977        vmstat.system.in
   3858794           +11.4%    4296995        will-it-scale.16.threads
     88.59            -1.6%      87.20        will-it-scale.16.threads_idle
    241174           +11.4%     268562        will-it-scale.per_thread_ops
   3858794           +11.4%    4296995        will-it-scale.workload
   4597109 ± 16%     -30.5%    3195509 ± 34%  numa-meminfo.node0.FilePages
     75985 ± 12%     -35.5%      48981 ± 34%  numa-meminfo.node0.KReclaimable
     75985 ± 12%     -35.5%      48981 ± 34%  numa-meminfo.node0.SReclaimable
   2516178 ± 30%     -55.8%    1113291 ± 99%  numa-meminfo.node0.Unevictable
      6475 ±125%    +199.9%      19417 ± 59%  numa-meminfo.node1.Mapped
   1149260 ± 16%     -30.6%     797998 ± 34%  numa-vmstat.node0.nr_file_pages
     18996 ± 12%     -35.5%      12243 ± 34%  numa-vmstat.node0.nr_slab_reclaimable
    629044 ± 30%     -55.8%     278322 ± 99%  numa-vmstat.node0.nr_unevictable
    629044 ± 30%     -55.8%     278322 ± 99%  numa-vmstat.node0.nr_zone_unevictable
 1.173e+09           +10.9%  1.301e+09        numa-vmstat.node0.numa_hit
 1.171e+09           +11.1%    1.3e+09        numa-vmstat.node0.numa_local
      1665 ±124%    +199.4%       4986 ± 59%  numa-vmstat.node1.nr_mapped
   2254868 ±  8%     +15.4%    2602210 ± 11%  sched_debug.cfs_rq:/.avg_vruntime.max
    671734 ± 10%     +19.6%     803578 ± 11%  sched_debug.cfs_rq:/.avg_vruntime.stddev
   2254868 ±  8%     +15.4%    2602210 ± 11%  sched_debug.cfs_rq:/.min_vruntime.max
    671734 ± 10%     +19.6%     803578 ± 11%  sched_debug.cfs_rq:/.min_vruntime.stddev
     95.28 ±  3%     +12.6%     107.30 ±  6%  sched_debug.cfs_rq:/.util_est.avg
    530.36 ±  4%     +12.6%     597.03 ±  4%  sched_debug.cpu.curr->pid.avg
      0.13 ±  5%     +12.9%       0.15 ±  4%  sched_debug.cpu.nr_running.avg
      0.03 ± 14%     -39.9%       0.02 ± 29%  sched_debug.cpu.nr_uninterruptible.avg
     25765            +2.9%      26515        proc-vmstat.nr_active_anon
    463751            -4.3%     443812        proc-vmstat.nr_anon_pages
    988326            -2.0%     968357        proc-vmstat.nr_inactive_anon
     25765            +2.9%      26515        proc-vmstat.nr_zone_active_anon
    988327            -2.0%     968357        proc-vmstat.nr_zone_inactive_anon
 1.173e+09           +10.9%  1.302e+09        proc-vmstat.numa_hit
 1.171e+09           +11.1%  1.301e+09        proc-vmstat.numa_local
     21662            +5.2%      22790        proc-vmstat.pgactivate
 1.168e+09           +11.2%  1.299e+09        proc-vmstat.pgalloc_normal
 1.164e+09           +11.2%  1.295e+09        proc-vmstat.pgfault
 1.168e+09           +11.2%  1.298e+09        proc-vmstat.pgfree
 3.587e+09           +10.5%  3.964e+09        perf-stat.i.branch-instructions
  23777908            +7.3%   25506059        perf-stat.i.branch-misses
 3.201e+08           +11.4%  3.567e+08        perf-stat.i.cache-misses
   4.1e+08           +13.2%  4.641e+08        perf-stat.i.cache-references
      2139            +3.3%       2210        perf-stat.i.context-switches
 3.885e+10           +10.8%  4.305e+10        perf-stat.i.cpu-cycles
 1.844e+10           +10.1%  2.031e+10        perf-stat.i.instructions
     74.11           +11.2%      82.38        perf-stat.i.metric.K/sec
   3853671           +11.2%    4283769        perf-stat.i.minor-faults
   3853671           +11.2%    4283769        perf-stat.i.page-faults
      0.66            -0.1        0.54 ± 44%  perf-stat.overall.branch-miss-rate%
   1442267           -17.6%    1188877 ± 44%  perf-stat.overall.path-length
      2.16            -0.6        1.52        perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      5.70            -0.5        5.18 ±  2%  perf-profile.calltrace.cycles-pp.__munmap
      5.70            -0.5        5.18 ±  2%  perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      5.70            -0.5        5.18 ±  2%  perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      5.70            -0.5        5.18 ±  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      5.70            -0.5        5.18 ±  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
      5.69            -0.5        5.17 ±  2%  perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
      5.69            -0.5        5.17 ±  2%  perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      5.69            -0.5        5.17 ±  2%  perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.27            -0.5        4.75 ±  2%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
      5.28            -0.5        4.76 ±  2%  perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      5.28            -0.5        4.76 ±  2%  perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      5.28            -0.5        4.76 ±  2%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
      4.92            -0.4        4.54        perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     31.58            -0.3       31.25        perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
      1.80 ±  2%      -0.2        1.65 ±  4%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      1.66 ±  2%      -0.1        1.52 ±  4%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      1.12 ±  2%      -0.1        1.03 ±  3%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
      1.10 ±  2%      -0.1        1.01 ±  3%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
      0.86 ±  3%      -0.1        0.80 ±  2%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      2.13            +0.1        2.20        perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault
      0.60 ±  2%      +0.1        0.69 ±  2%  perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range
      2.26            +0.1        2.35        perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.59 ±  4%      +0.1        0.72 ± 13%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folios_put_refs.release_pages.__tlb_batch_free_encoded_pages.tlb_flush_mmu
      0.58 ±  4%      +0.1        0.72 ± 13%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folios_put_refs.release_pages.__tlb_batch_free_encoded_pages
      0.81 ±  4%      +0.2        0.96 ± 10%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range
      0.82 ±  4%      +0.2        0.98 ± 10%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
      0.71 ±  4%      +0.2        0.87 ± 11%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma
      2.85            +0.2        3.05 ±  5%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault
      3.06            +0.2        3.27 ±  4%  perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault
      0.45 ± 44%      +0.2        0.67 ± 14%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folios_put_refs.release_pages
      9.85            +0.3       10.17        perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     14.80            +0.3       15.15        perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     32.43            +0.4       32.82        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
     33.75            +0.4       34.18        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     35.10            +0.5       35.56        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     38.62            +0.6       39.21        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     38.89            +0.6       39.48        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
      0.00            +0.7        0.68        perf-profile.calltrace.cycles-pp.free_unref_folios.folios_put_refs.release_pages.__tlb_batch_free_encoded_pages.tlb_flush_mmu
     54.53            +0.8       55.31        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
     60.07            +0.9       60.95        perf-profile.calltrace.cycles-pp.testcase
      1.06 ±  2%      -0.8        0.22 ±  7%  perf-profile.children.cycles-pp._compound_head
      2.20            -0.6        1.57        perf-profile.children.cycles-pp.zap_present_ptes
      5.85            -0.5        5.32 ±  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      5.70            -0.5        5.18 ±  2%  perf-profile.children.cycles-pp.__munmap
      5.70            -0.5        5.18 ±  2%  perf-profile.children.cycles-pp.do_vmi_munmap
      5.84            -0.5        5.32 ±  2%  perf-profile.children.cycles-pp.do_syscall_64
      5.69            -0.5        5.17 ±  2%  perf-profile.children.cycles-pp.unmap_region
      5.70            -0.5        5.18 ±  2%  perf-profile.children.cycles-pp.__vm_munmap
      5.70            -0.5        5.18 ±  2%  perf-profile.children.cycles-pp.__x64_sys_munmap
      5.70            -0.5        5.18 ±  2%  perf-profile.children.cycles-pp.do_vmi_align_munmap
      5.28            -0.5        4.77 ±  2%  perf-profile.children.cycles-pp.unmap_vmas
      5.28            -0.5        4.76 ±  2%  perf-profile.children.cycles-pp.zap_pte_range
      5.28            -0.5        4.77 ±  2%  perf-profile.children.cycles-pp.unmap_page_range
      5.28            -0.5        4.77 ±  2%  perf-profile.children.cycles-pp.zap_pmd_range
      0.85 ±  6%      -0.4        0.46 ±  5%  perf-profile.children.cycles-pp.__folio_throttle_swaprate
      4.98            -0.4        4.60        perf-profile.children.cycles-pp.folio_prealloc
     31.58            -0.3       31.25        perf-profile.children.cycles-pp.cpu_startup_entry
     31.50            -0.3       31.18        perf-profile.children.cycles-pp.cpuidle_idle_call
     31.58            -0.3       31.25        perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     31.58            -0.3       31.25        perf-profile.children.cycles-pp.do_idle
     31.04            -0.3       30.75        perf-profile.children.cycles-pp.cpuidle_enter_state
     31.06            -0.3       30.77        perf-profile.children.cycles-pp.cpuidle_enter
      2.26            -0.1        2.14 ±  3%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
     28.90            -0.1       28.79        perf-profile.children.cycles-pp.intel_idle_ibrs
      2.06            -0.1        1.95 ±  3%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      1.45            -0.0        1.40 ±  2%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.20 ±  3%      -0.0        0.17 ±  4%  perf-profile.children.cycles-pp.__rmqueue_pcplist
      0.33 ±  2%      -0.0        0.30 ±  4%  perf-profile.children.cycles-pp.menu_select
      0.18 ±  4%      -0.0        0.16 ±  3%  perf-profile.children.cycles-pp.free_swap_cache
      0.20 ±  2%      -0.0        0.18        perf-profile.children.cycles-pp.free_pages_and_swap_cache
      0.22 ±  4%      +0.0        0.24 ±  5%  perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.06 ±  7%      +0.0        0.09 ±  6%  perf-profile.children.cycles-pp.__tlb_remove_folio_pages_size
      0.49 ±  2%      +0.0        0.52 ±  2%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.14 ±  4%      +0.0        0.18 ±  2%  perf-profile.children.cycles-pp.get_pfnblock_flags_mask
      0.74            +0.1        0.79        perf-profile.children.cycles-pp.free_unref_folios
      2.44            +0.1        2.50        perf-profile.children.cycles-pp.native_irq_return_iret
      0.28            +0.1        0.35 ±  2%  perf-profile.children.cycles-pp.free_unref_page_prepare
      2.14            +0.1        2.21        perf-profile.children.cycles-pp.shmem_fault
      0.60 ±  2%      +0.1        0.69 ±  2%  perf-profile.children.cycles-pp.folio_remove_rmap_ptes
      2.27            +0.1        2.36        perf-profile.children.cycles-pp.__do_fault
      0.80 ±  2%      +0.1        0.92 ±  2%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      1.34 ±  5%      +0.1        1.48 ±  4%  perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
      2.87 ±  2%      +0.2        3.06 ±  5%  perf-profile.children.cycles-pp.folio_batch_move_lru
      3.07            +0.2        3.28 ±  4%  perf-profile.children.cycles-pp.folio_add_lru_vma
      1.32 ±  4%      +0.3        1.62 ± 11%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      1.50 ±  4%      +0.3        1.80 ± 10%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      1.48 ±  4%      +0.3        1.79 ± 10%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      9.88            +0.3       10.20        perf-profile.children.cycles-pp.finish_fault
      0.00            +0.3        0.34 ±  2%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
     14.84            +0.3       15.18        perf-profile.children.cycles-pp.copy_page
     32.45            +0.4       32.85        perf-profile.children.cycles-pp.do_fault
     33.78            +0.4       34.22        perf-profile.children.cycles-pp.__handle_mm_fault
     35.16            +0.4       35.60        perf-profile.children.cycles-pp.handle_mm_fault
     38.66            +0.6       39.25        perf-profile.children.cycles-pp.do_user_addr_fault
     38.93            +0.6       39.52        perf-profile.children.cycles-pp.exc_page_fault
     49.87            +0.7       50.59        perf-profile.children.cycles-pp.asm_exc_page_fault
     62.47            +0.9       63.33        perf-profile.children.cycles-pp.testcase
      1.04 ±  2%      -0.8        0.19 ±  8%  perf-profile.self.cycles-pp._compound_head
      0.56 ±  7%      -0.4        0.18 ±  8%  perf-profile.self.cycles-pp.__folio_throttle_swaprate
     28.90            -0.1       28.78        perf-profile.self.cycles-pp.intel_idle_ibrs
      0.27 ±  2%      -0.0        0.24 ±  3%  perf-profile.self.cycles-pp.folio_remove_rmap_ptes
      0.18 ±  2%      -0.0        0.15 ±  2%  perf-profile.self.cycles-pp.free_swap_cache
      0.20 ±  4%      -0.0        0.18 ±  3%  perf-profile.self.cycles-pp.cpuidle_enter_state
      0.08 ±  5%      +0.0        0.10 ±  5%  perf-profile.self.cycles-pp.__do_fault
      0.06 ±  6%      +0.0        0.09 ±  5%  perf-profile.self.cycles-pp.__tlb_remove_folio_pages_size
      0.14 ±  4%      +0.0        0.18 ±  4%  perf-profile.self.cycles-pp.get_pfnblock_flags_mask
      2.44            +0.1        2.49        perf-profile.self.cycles-pp.native_irq_return_iret
      0.30 ±  4%      +0.1        0.36 ±  4%  perf-profile.self.cycles-pp.shmem_fault
      0.15 ±  3%      +0.1        0.21 ±  2%  perf-profile.self.cycles-pp.zap_pte_range
      0.66 ±  2%      +0.1        0.76 ±  3%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      0.45 ±  3%      +0.1        0.56 ±  3%  perf-profile.self.cycles-pp.zap_present_ptes
      1.32 ±  4%      +0.3        1.62 ± 11%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
     14.74            +0.3       15.08        perf-profile.self.cycles-pp.copy_page




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux