Hello, kernel test robot noticed a 11.4% improvement of will-it-scale.per_thread_ops on: commit: 7c33b8c4229af19797c78de48827ca70228c1f47 ("mm: remove use of folio list from folios_put()") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: will-it-scale test machine: 104 threads 2 sockets (Skylake) with 192G memory parameters: nr_task: 16 mode: thread test: page_fault2 cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240325/202403251657.f8cbfe2a-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale commit: 4882c80975 ("memcg: add mem_cgroup_uncharge_folios()") 7c33b8c422 ("mm: remove use of folio list from folios_put()") 4882c80975e2bf72 7c33b8c4229af19797c78de4882 ---------------- --------------------------- %stddev %change %stddev \ | \ 9.19 +1.1 10.32 mpstat.cpu.all.sys% 2.00 +0.2 2.23 mpstat.cpu.all.usr% 1.171e+09 +11.0% 1.3e+09 numa-numastat.node0.local_node 1.173e+09 +10.9% 1.301e+09 numa-numastat.node0.numa_hit 12.80 +10.6% 14.16 vmstat.procs.r 2179 +2.7% 2239 vmstat.system.cs 28134 +10.1% 30977 vmstat.system.in 3858794 +11.4% 4296995 will-it-scale.16.threads 88.59 -1.6% 87.20 will-it-scale.16.threads_idle 241174 +11.4% 268562 will-it-scale.per_thread_ops 3858794 +11.4% 4296995 will-it-scale.workload 4597109 ± 16% -30.5% 3195509 ± 34% numa-meminfo.node0.FilePages 75985 ± 12% -35.5% 48981 ± 34% numa-meminfo.node0.KReclaimable 75985 ± 12% -35.5% 48981 ± 34% numa-meminfo.node0.SReclaimable 2516178 ± 30% -55.8% 1113291 ± 99% numa-meminfo.node0.Unevictable 6475 ±125% +199.9% 19417 ± 59% numa-meminfo.node1.Mapped 1149260 ± 16% -30.6% 797998 ± 34% numa-vmstat.node0.nr_file_pages 18996 ± 12% -35.5% 12243 ± 34% numa-vmstat.node0.nr_slab_reclaimable 629044 ± 30% -55.8% 278322 ± 99% numa-vmstat.node0.nr_unevictable 629044 ± 30% -55.8% 278322 ± 99% numa-vmstat.node0.nr_zone_unevictable 1.173e+09 +10.9% 1.301e+09 numa-vmstat.node0.numa_hit 1.171e+09 +11.1% 1.3e+09 numa-vmstat.node0.numa_local 1665 ±124% +199.4% 4986 ± 59% numa-vmstat.node1.nr_mapped 2254868 ± 8% +15.4% 2602210 ± 11% sched_debug.cfs_rq:/.avg_vruntime.max 671734 ± 10% +19.6% 803578 ± 11% sched_debug.cfs_rq:/.avg_vruntime.stddev 2254868 ± 8% +15.4% 2602210 ± 11% sched_debug.cfs_rq:/.min_vruntime.max 671734 ± 10% +19.6% 803578 ± 11% sched_debug.cfs_rq:/.min_vruntime.stddev 95.28 ± 3% +12.6% 107.30 ± 6% sched_debug.cfs_rq:/.util_est.avg 530.36 ± 4% +12.6% 597.03 ± 4% sched_debug.cpu.curr->pid.avg 0.13 ± 5% +12.9% 0.15 ± 4% sched_debug.cpu.nr_running.avg 0.03 ± 14% -39.9% 0.02 ± 29% sched_debug.cpu.nr_uninterruptible.avg 25765 +2.9% 26515 proc-vmstat.nr_active_anon 463751 -4.3% 443812 proc-vmstat.nr_anon_pages 988326 -2.0% 968357 proc-vmstat.nr_inactive_anon 25765 +2.9% 26515 proc-vmstat.nr_zone_active_anon 988327 -2.0% 968357 proc-vmstat.nr_zone_inactive_anon 1.173e+09 +10.9% 1.302e+09 proc-vmstat.numa_hit 1.171e+09 +11.1% 1.301e+09 proc-vmstat.numa_local 21662 +5.2% 22790 proc-vmstat.pgactivate 1.168e+09 +11.2% 1.299e+09 proc-vmstat.pgalloc_normal 1.164e+09 +11.2% 1.295e+09 proc-vmstat.pgfault 1.168e+09 +11.2% 1.298e+09 proc-vmstat.pgfree 3.587e+09 +10.5% 3.964e+09 perf-stat.i.branch-instructions 23777908 +7.3% 25506059 perf-stat.i.branch-misses 3.201e+08 +11.4% 3.567e+08 perf-stat.i.cache-misses 4.1e+08 +13.2% 4.641e+08 perf-stat.i.cache-references 2139 +3.3% 2210 perf-stat.i.context-switches 3.885e+10 +10.8% 4.305e+10 perf-stat.i.cpu-cycles 1.844e+10 +10.1% 2.031e+10 perf-stat.i.instructions 74.11 +11.2% 82.38 perf-stat.i.metric.K/sec 3853671 +11.2% 4283769 perf-stat.i.minor-faults 3853671 +11.2% 4283769 perf-stat.i.page-faults 0.66 -0.1 0.54 ± 44% perf-stat.overall.branch-miss-rate% 1442267 -17.6% 1188877 ± 44% perf-stat.overall.path-length 2.16 -0.6 1.52 perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 5.70 -0.5 5.18 ± 2% perf-profile.calltrace.cycles-pp.__munmap 5.70 -0.5 5.18 ± 2% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 5.70 -0.5 5.18 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 5.70 -0.5 5.18 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 5.70 -0.5 5.18 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 5.69 -0.5 5.17 ± 2% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap 5.69 -0.5 5.17 ± 2% perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 5.69 -0.5 5.17 ± 2% perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 5.27 -0.5 4.75 ± 2% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 5.28 -0.5 4.76 ± 2% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap 5.28 -0.5 4.76 ± 2% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 5.28 -0.5 4.76 ± 2% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap 4.92 -0.4 4.54 perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 31.58 -0.3 31.25 perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify 1.80 ± 2% -0.2 1.65 ± 4% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 1.66 ± 2% -0.1 1.52 ± 4% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call 1.12 ± 2% -0.1 1.03 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter 1.10 ± 2% -0.1 1.01 ± 3% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state 0.86 ± 3% -0.1 0.80 ± 2% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt 2.13 +0.1 2.20 perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault 0.60 ± 2% +0.1 0.69 ± 2% perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range 2.26 +0.1 2.35 perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.59 ± 4% +0.1 0.72 ± 13% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folios_put_refs.release_pages.__tlb_batch_free_encoded_pages.tlb_flush_mmu 0.58 ± 4% +0.1 0.72 ± 13% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folios_put_refs.release_pages.__tlb_batch_free_encoded_pages 0.81 ± 4% +0.2 0.96 ± 10% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range 0.82 ± 4% +0.2 0.98 ± 10% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault 0.71 ± 4% +0.2 0.87 ± 11% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma 2.85 +0.2 3.05 ± 5% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault 3.06 +0.2 3.27 ± 4% perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault 0.45 ± 44% +0.2 0.67 ± 14% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folios_put_refs.release_pages 9.85 +0.3 10.17 perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 14.80 +0.3 15.15 perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 32.43 +0.4 32.82 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 33.75 +0.4 34.18 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 35.10 +0.5 35.56 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 38.62 +0.6 39.21 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 38.89 +0.6 39.48 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase 0.00 +0.7 0.68 perf-profile.calltrace.cycles-pp.free_unref_folios.folios_put_refs.release_pages.__tlb_batch_free_encoded_pages.tlb_flush_mmu 54.53 +0.8 55.31 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase 60.07 +0.9 60.95 perf-profile.calltrace.cycles-pp.testcase 1.06 ± 2% -0.8 0.22 ± 7% perf-profile.children.cycles-pp._compound_head 2.20 -0.6 1.57 perf-profile.children.cycles-pp.zap_present_ptes 5.85 -0.5 5.32 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 5.70 -0.5 5.18 ± 2% perf-profile.children.cycles-pp.__munmap 5.70 -0.5 5.18 ± 2% perf-profile.children.cycles-pp.do_vmi_munmap 5.84 -0.5 5.32 ± 2% perf-profile.children.cycles-pp.do_syscall_64 5.69 -0.5 5.17 ± 2% perf-profile.children.cycles-pp.unmap_region 5.70 -0.5 5.18 ± 2% perf-profile.children.cycles-pp.__vm_munmap 5.70 -0.5 5.18 ± 2% perf-profile.children.cycles-pp.__x64_sys_munmap 5.70 -0.5 5.18 ± 2% perf-profile.children.cycles-pp.do_vmi_align_munmap 5.28 -0.5 4.77 ± 2% perf-profile.children.cycles-pp.unmap_vmas 5.28 -0.5 4.76 ± 2% perf-profile.children.cycles-pp.zap_pte_range 5.28 -0.5 4.77 ± 2% perf-profile.children.cycles-pp.unmap_page_range 5.28 -0.5 4.77 ± 2% perf-profile.children.cycles-pp.zap_pmd_range 0.85 ± 6% -0.4 0.46 ± 5% perf-profile.children.cycles-pp.__folio_throttle_swaprate 4.98 -0.4 4.60 perf-profile.children.cycles-pp.folio_prealloc 31.58 -0.3 31.25 perf-profile.children.cycles-pp.cpu_startup_entry 31.50 -0.3 31.18 perf-profile.children.cycles-pp.cpuidle_idle_call 31.58 -0.3 31.25 perf-profile.children.cycles-pp.secondary_startup_64_no_verify 31.58 -0.3 31.25 perf-profile.children.cycles-pp.do_idle 31.04 -0.3 30.75 perf-profile.children.cycles-pp.cpuidle_enter_state 31.06 -0.3 30.77 perf-profile.children.cycles-pp.cpuidle_enter 2.26 -0.1 2.14 ± 3% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 28.90 -0.1 28.79 perf-profile.children.cycles-pp.intel_idle_ibrs 2.06 -0.1 1.95 ± 3% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 1.45 -0.0 1.40 ± 2% perf-profile.children.cycles-pp.hrtimer_interrupt 0.20 ± 3% -0.0 0.17 ± 4% perf-profile.children.cycles-pp.__rmqueue_pcplist 0.33 ± 2% -0.0 0.30 ± 4% perf-profile.children.cycles-pp.menu_select 0.18 ± 4% -0.0 0.16 ± 3% perf-profile.children.cycles-pp.free_swap_cache 0.20 ± 2% -0.0 0.18 perf-profile.children.cycles-pp.free_pages_and_swap_cache 0.22 ± 4% +0.0 0.24 ± 5% perf-profile.children.cycles-pp.cgroup_rstat_updated 0.06 ± 7% +0.0 0.09 ± 6% perf-profile.children.cycles-pp.__tlb_remove_folio_pages_size 0.49 ± 2% +0.0 0.52 ± 2% perf-profile.children.cycles-pp.__mod_node_page_state 0.14 ± 4% +0.0 0.18 ± 2% perf-profile.children.cycles-pp.get_pfnblock_flags_mask 0.74 +0.1 0.79 perf-profile.children.cycles-pp.free_unref_folios 2.44 +0.1 2.50 perf-profile.children.cycles-pp.native_irq_return_iret 0.28 +0.1 0.35 ± 2% perf-profile.children.cycles-pp.free_unref_page_prepare 2.14 +0.1 2.21 perf-profile.children.cycles-pp.shmem_fault 0.60 ± 2% +0.1 0.69 ± 2% perf-profile.children.cycles-pp.folio_remove_rmap_ptes 2.27 +0.1 2.36 perf-profile.children.cycles-pp.__do_fault 0.80 ± 2% +0.1 0.92 ± 2% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 1.34 ± 5% +0.1 1.48 ± 4% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio 2.87 ± 2% +0.2 3.06 ± 5% perf-profile.children.cycles-pp.folio_batch_move_lru 3.07 +0.2 3.28 ± 4% perf-profile.children.cycles-pp.folio_add_lru_vma 1.32 ± 4% +0.3 1.62 ± 11% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 1.50 ± 4% +0.3 1.80 ± 10% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 1.48 ± 4% +0.3 1.79 ± 10% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 9.88 +0.3 10.20 perf-profile.children.cycles-pp.finish_fault 0.00 +0.3 0.34 ± 2% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios 14.84 +0.3 15.18 perf-profile.children.cycles-pp.copy_page 32.45 +0.4 32.85 perf-profile.children.cycles-pp.do_fault 33.78 +0.4 34.22 perf-profile.children.cycles-pp.__handle_mm_fault 35.16 +0.4 35.60 perf-profile.children.cycles-pp.handle_mm_fault 38.66 +0.6 39.25 perf-profile.children.cycles-pp.do_user_addr_fault 38.93 +0.6 39.52 perf-profile.children.cycles-pp.exc_page_fault 49.87 +0.7 50.59 perf-profile.children.cycles-pp.asm_exc_page_fault 62.47 +0.9 63.33 perf-profile.children.cycles-pp.testcase 1.04 ± 2% -0.8 0.19 ± 8% perf-profile.self.cycles-pp._compound_head 0.56 ± 7% -0.4 0.18 ± 8% perf-profile.self.cycles-pp.__folio_throttle_swaprate 28.90 -0.1 28.78 perf-profile.self.cycles-pp.intel_idle_ibrs 0.27 ± 2% -0.0 0.24 ± 3% perf-profile.self.cycles-pp.folio_remove_rmap_ptes 0.18 ± 2% -0.0 0.15 ± 2% perf-profile.self.cycles-pp.free_swap_cache 0.20 ± 4% -0.0 0.18 ± 3% perf-profile.self.cycles-pp.cpuidle_enter_state 0.08 ± 5% +0.0 0.10 ± 5% perf-profile.self.cycles-pp.__do_fault 0.06 ± 6% +0.0 0.09 ± 5% perf-profile.self.cycles-pp.__tlb_remove_folio_pages_size 0.14 ± 4% +0.0 0.18 ± 4% perf-profile.self.cycles-pp.get_pfnblock_flags_mask 2.44 +0.1 2.49 perf-profile.self.cycles-pp.native_irq_return_iret 0.30 ± 4% +0.1 0.36 ± 4% perf-profile.self.cycles-pp.shmem_fault 0.15 ± 3% +0.1 0.21 ± 2% perf-profile.self.cycles-pp.zap_pte_range 0.66 ± 2% +0.1 0.76 ± 3% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 0.45 ± 3% +0.1 0.56 ± 3% perf-profile.self.cycles-pp.zap_present_ptes 1.32 ± 4% +0.3 1.62 ± 11% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 14.74 +0.3 15.08 perf-profile.self.cycles-pp.copy_page Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki