Hello, kernel test robot noticed a 122.2% improvement of will-it-scale.per_thread_ops on: commit: 4ed4379881aa62588aba6442a9f362a8cf7624e6 ("mm: handle shared faults under the VMA lock") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: will-it-scale test machine: 104 threads 2 sockets (Skylake) with 192G memory parameters: nr_task: 16 mode: thread test: page_fault3 cpufreq_governor: performance In addition to that, the commit also has significant impact on the following tests: +------------------+-------------------------------------------------------------------------------------------------+ | testcase: change | will-it-scale: will-it-scale.per_process_ops 7.8% improvement | | test machine | 104 threads 2 sockets (Skylake) with 192G memory | | test parameters | cpufreq_governor=performance | | | mode=process | | | nr_task=16 | | | test=page_fault3 | +------------------+-------------------------------------------------------------------------------------------------+ Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20231115/202311151610.43a1e565-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/page_fault3/will-it-scale commit: 164b06f238 ("mm: call wp_page_copy() under the VMA lock") 4ed4379881 ("mm: handle shared faults under the VMA lock") 164b06f238b98631 4ed4379881aa62588aba6442a9f ---------------- --------------------------- %stddev %change %stddev \ | \ 150146 ± 13% +19.7% 179652 ± 7% numa-meminfo.node0.Slab 32317 ± 5% -5.6% 30493 uptime.idle 11.42 ± 2% -1.2 10.22 ± 2% mpstat.cpu.all.sys% 1.86 ± 3% +2.3 4.14 mpstat.cpu.all.usr% 2648761 +89.3% 5013662 numa-numastat.node0.local_node 2696913 ± 2% +88.3% 5077488 numa-numastat.node0.numa_hit 2696747 ± 2% +88.3% 5077459 numa-vmstat.node0.numa_hit 2648596 +89.3% 5013633 numa-vmstat.node0.numa_local 107.33 ± 14% +72.8% 185.50 ± 11% perf-c2c.DRAM.local 2752 ± 12% -39.0% 1678 ± 15% perf-c2c.HITM.local 6748 ± 2% -59.5% 2730 vmstat.system.cs 175383 +76.1% 308919 vmstat.system.in 3301453 +122.2% 7336336 will-it-scale.16.threads 84.29 -1.1% 83.34 will-it-scale.16.threads_idle 206340 +122.2% 458520 will-it-scale.per_thread_ops 3301453 +122.2% 7336336 will-it-scale.workload 263502 ± 2% +4.7% 275788 proc-vmstat.nr_mapped 3322833 +72.3% 5723978 proc-vmstat.numa_hit 3215175 +74.7% 5616311 proc-vmstat.numa_local 3408340 +70.6% 5814446 proc-vmstat.pgalloc_normal 9.943e+08 +122.1% 2.208e+09 proc-vmstat.pgfault 3359696 +71.6% 5765300 proc-vmstat.pgfree 1102135 ± 9% +19.8% 1320904 ± 6% sched_debug.cfs_rq:/.avg_vruntime.max 1102135 ± 9% +19.8% 1320904 ± 6% sched_debug.cfs_rq:/.min_vruntime.max 862.58 ± 3% +13.3% 976.92 ± 6% sched_debug.cfs_rq:/.runnable_avg.max 862.44 ± 3% +13.2% 976.50 ± 6% sched_debug.cfs_rq:/.util_avg.max 286.93 ± 6% +12.1% 321.76 ± 4% sched_debug.cfs_rq:/.util_est_enqueued.stddev 202058 ± 8% -35.9% 129538 ± 3% sched_debug.cpu.avg_idle.stddev 11549 ± 6% -49.0% 5886 sched_debug.cpu.nr_switches.avg 13882 ± 9% -36.2% 8854 ± 11% sched_debug.cpu.nr_switches.stddev 450048 ± 4% -92.1% 35653 ± 4% turbostat.C1 0.40 ± 5% -0.4 0.02 ± 28% turbostat.C1% 986845 -75.4% 242529 ± 6% turbostat.C1E 1.07 ± 4% -0.8 0.30 ± 12% turbostat.C1E% 0.08 ± 5% +62.0% 0.14 ± 3% turbostat.IPC 76389757 +106.4% 1.577e+08 turbostat.IRQ 218.21 +6.4% 232.07 turbostat.PkgWatt 20.26 +2.6% 20.79 turbostat.RAMWatt 0.01 ±145% -94.2% 0.00 ±111% perf-sched.sch_delay.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault 0.00 ± 9% -78.3% 0.00 ± 82% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault 0.00 ± 9% -73.9% 0.00 ± 57% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 0.00 ± 14% -87.5% 0.00 ± 99% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi 0.02 ±164% -100.0% 0.00 perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 0.01 ± 58% -100.0% 0.00 perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma 167.10 ±223% +200.2% 501.66 ± 99% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64 0.03 ±160% -100.0% 0.00 perf-sched.sch_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 12.57 ± 58% -100.0% 0.00 perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma 0.01 ± 23% -54.1% 0.01 ± 67% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff 47.80 ± 2% +123.9% 107.01 ± 3% perf-sched.total_wait_and_delay.average.ms 18466 ± 2% -55.7% 8181 ± 3% perf-sched.total_wait_and_delay.count.ms 47.77 ± 2% +123.6% 106.82 ± 3% perf-sched.total_wait_time.average.ms 2.79 ± 10% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 0.69 ± 9% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma 0.82 ± 10% +125.0% 1.85 ± 14% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap 0.12 ± 33% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff 143.67 ± 5% -100.0% 0.00 perf-sched.wait_and_delay.count.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 11285 ± 3% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma 971.50 ± 2% +117.6% 2114 ± 3% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap 279.17 ± 4% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff 4.85 ± 6% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 13.27 ± 47% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma 5.25 ± 25% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff 33.35 ± 14% -85.5% 4.84 ± 7% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 2.77 ± 11% -100.0% 0.00 perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 0.68 ± 9% -100.0% 0.00 perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma 0.81 ± 9% +127.0% 1.84 ± 14% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap 4.84 ± 6% -100.0% 0.00 perf-sched.wait_time.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 4.27 -100.0% 0.00 perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma 32.32 ± 18% -85.1% 4.83 ± 7% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 2.627e+09 ± 3% +73.7% 4.562e+09 perf-stat.i.branch-instructions 1.17 ± 29% -0.3 0.89 ± 20% perf-stat.i.branch-miss-rate% 23408547 ± 2% +49.2% 34925457 perf-stat.i.branch-misses 5.77 ± 22% +3.2 9.00 ± 17% perf-stat.i.cache-miss-rate% 8222112 ± 2% +64.8% 13546123 perf-stat.i.cache-misses 6727 ± 2% -60.5% 2660 perf-stat.i.context-switches 3.50 ± 5% -38.4% 2.15 ± 5% perf-stat.i.cpi 4.18e+10 ± 2% +7.6% 4.497e+10 perf-stat.i.cpu-cycles 5076 ± 2% -34.6% 3320 perf-stat.i.cycles-between-cache-misses 7634922 +121.9% 16944094 perf-stat.i.dTLB-load-misses 3.135e+09 ± 3% +78.2% 5.587e+09 perf-stat.i.dTLB-loads 5.14 ± 2% +1.4 6.57 perf-stat.i.dTLB-store-miss-rate% 94623040 ± 3% +128.1% 2.158e+08 perf-stat.i.dTLB-store-misses 1.713e+09 ± 3% +77.4% 3.039e+09 perf-stat.i.dTLB-stores 82.69 +4.3 87.04 perf-stat.i.iTLB-load-miss-rate% 4833304 ± 3% +121.4% 10702487 ± 3% perf-stat.i.iTLB-load-misses 996162 +57.1% 1564855 perf-stat.i.iTLB-loads 1.251e+10 ± 3% +73.2% 2.167e+10 perf-stat.i.instructions 2571 ± 2% -21.0% 2030 ± 3% perf-stat.i.instructions-per-iTLB-miss 0.30 ± 2% +62.0% 0.48 perf-stat.i.ipc 0.40 ± 2% +7.6% 0.43 perf-stat.i.metric.GHz 1068 ± 2% -75.6% 260.15 ± 10% perf-stat.i.metric.K/sec 73.25 ± 3% +77.9% 130.33 perf-stat.i.metric.M/sec 3223417 ± 3% +124.8% 7247446 perf-stat.i.minor-faults 26.42 ± 7% -10.4 16.06 ± 3% perf-stat.i.node-load-miss-rate% 282161 ± 5% +98.6% 560500 ± 7% perf-stat.i.node-loads 3248470 ± 3% +124.2% 7282664 perf-stat.i.node-stores 3223417 ± 3% +124.8% 7247446 perf-stat.i.page-faults 0.66 -4.9% 0.62 perf-stat.overall.MPKI 0.89 -0.1 0.77 perf-stat.overall.branch-miss-rate% 5.72 ± 22% +3.2 8.95 ± 17% perf-stat.overall.cache-miss-rate% 3.34 -37.9% 2.07 perf-stat.overall.cpi 5085 ± 2% -34.7% 3319 perf-stat.overall.cycles-between-cache-misses 0.24 +0.1 0.30 perf-stat.overall.dTLB-load-miss-rate% 5.24 +1.4 6.63 perf-stat.overall.dTLB-store-miss-rate% 82.90 +4.3 87.23 perf-stat.overall.iTLB-load-miss-rate% 2589 -21.7% 2027 ± 3% perf-stat.overall.instructions-per-iTLB-miss 0.30 +61.0% 0.48 perf-stat.overall.ipc 25.70 ± 4% -10.0 15.74 ± 7% perf-stat.overall.node-load-miss-rate% 0.63 ± 13% -0.3 0.30 ± 7% perf-stat.overall.node-store-miss-rate% 1168062 -23.0% 899539 perf-stat.overall.path-length 2.618e+09 ± 3% +73.6% 4.547e+09 perf-stat.ps.branch-instructions 23334434 ± 2% +49.2% 34805568 perf-stat.ps.branch-misses 8195263 ± 2% +64.8% 13501705 perf-stat.ps.cache-misses 6706 ± 2% -60.5% 2651 perf-stat.ps.context-switches 4.167e+10 ± 2% +7.6% 4.482e+10 perf-stat.ps.cpu-cycles 7610946 +121.9% 16890018 perf-stat.ps.dTLB-load-misses 3.125e+09 ± 3% +78.2% 5.569e+09 perf-stat.ps.dTLB-loads 94332447 ± 3% +128.1% 2.151e+08 perf-stat.ps.dTLB-store-misses 1.707e+09 ± 3% +77.4% 3.029e+09 perf-stat.ps.dTLB-stores 4818281 ± 3% +121.4% 10668153 ± 3% perf-stat.ps.iTLB-load-misses 993104 +57.1% 1559797 perf-stat.ps.iTLB-loads 1.248e+10 ± 3% +73.2% 2.16e+10 perf-stat.ps.instructions 3213492 ± 3% +124.8% 7224439 perf-stat.ps.minor-faults 281299 ± 5% +98.6% 558721 ± 7% perf-stat.ps.node-loads 3238382 ± 3% +124.2% 7259432 perf-stat.ps.node-stores 3213492 ± 3% +124.8% 7224439 perf-stat.ps.page-faults 3.856e+12 +71.1% 6.599e+12 perf-stat.total.instructions 46.83 ± 2% -24.6 22.23 ± 3% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 47.07 ± 2% -24.4 22.68 ± 3% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase 16.08 ± 6% -16.1 0.00 perf-profile.calltrace.cycles-pp.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 14.24 ± 2% -14.2 0.00 perf-profile.calltrace.cycles-pp.down_read_trylock.lock_mm_and_find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 62.50 -7.9 54.60 ± 2% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase 63.71 -6.9 56.82 ± 2% perf-profile.calltrace.cycles-pp.testcase 6.62 -6.6 0.00 perf-profile.calltrace.cycles-pp.up_read.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 18.65 -6.2 12.49 ± 2% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 20.30 -5.0 15.35 ± 3% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 1.34 ± 6% -0.2 1.17 ± 4% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 1.24 ± 6% -0.2 1.07 ± 5% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call 0.84 ± 7% -0.1 0.72 ± 7% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter 0.82 ± 7% -0.1 0.71 ± 7% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state 0.66 ± 7% -0.1 0.59 ± 6% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt 0.00 +0.5 0.53 ± 2% perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.shmem_get_folio_gfp.shmem_fault 0.00 +0.6 0.56 ± 2% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.tlb_flush_rmaps.zap_pte_range.zap_pmd_range 0.00 +0.6 0.56 ± 7% perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.fault_dirty_shared_page.do_fault.__handle_mm_fault 0.58 ± 4% +0.6 1.22 ± 2% perf-profile.calltrace.cycles-pp.page_remove_rmap.tlb_flush_rmaps.zap_pte_range.zap_pmd_range.unmap_page_range 0.00 +0.6 0.64 ± 5% perf-profile.calltrace.cycles-pp.mtree_range_walk.mas_walk.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault 0.35 ± 70% +0.6 0.99 ± 3% perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault 0.00 +0.7 0.66 ± 5% perf-profile.calltrace.cycles-pp.file_update_time.fault_dirty_shared_page.do_fault.__handle_mm_fault.handle_mm_fault 0.00 +0.7 0.66 ± 13% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.folio_add_file_rmap_range.set_pte_range.finish_fault.do_fault 0.64 ± 3% +0.7 1.35 ± 3% perf-profile.calltrace.cycles-pp.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.00 +0.7 0.70 ± 3% perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 0.67 ± 4% +0.7 1.39 ± 2% perf-profile.calltrace.cycles-pp.tlb_flush_rmaps.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 0.35 ± 70% +0.7 1.06 ± 4% perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.36 ± 71% +0.8 1.15 ± 6% perf-profile.calltrace.cycles-pp.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.84 ± 4% +0.8 1.67 ± 2% perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault 0.19 ±141% +0.9 1.06 ± 8% perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.finish_fault.do_fault.__handle_mm_fault 0.00 +0.9 0.90 ± 3% perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault 0.82 ± 11% +1.0 1.80 ± 4% perf-profile.calltrace.cycles-pp.fault_dirty_shared_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 1.21 ± 2% +1.2 2.41 ± 2% perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault 1.32 ± 11% +1.2 2.52 ± 6% perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 1.48 ± 2% +1.4 2.88 ± 2% perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault 1.41 ± 2% +1.5 2.87 ± 2% perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase 1.56 ± 3% +1.5 3.03 ± 2% perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 2.65 ± 7% +1.6 4.30 ± 8% perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 1.79 ± 8% +1.7 3.50 ± 4% perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 2.11 ± 4% +2.3 4.46 ± 3% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 2.15 ± 4% +2.4 4.54 ± 3% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap 2.15 ± 4% +2.4 4.55 ± 3% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 2.15 ± 4% +2.4 4.55 ± 3% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap 2.16 ± 5% +2.4 4.56 ± 3% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap 2.16 ± 5% +2.4 4.57 ± 3% perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.16 ± 5% +2.4 4.57 ± 3% perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 2.18 ± 5% +2.4 4.58 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 2.18 ± 5% +2.4 4.58 ± 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 2.18 ± 5% +2.4 4.58 ± 3% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 2.18 ± 5% +2.4 4.58 ± 3% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 2.18 ± 5% +2.4 4.58 ± 3% perf-profile.calltrace.cycles-pp.__munmap 3.17 ± 2% +3.2 6.36 ± 2% perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_page_fault.testcase 3.62 ± 3% +3.8 7.43 ± 2% perf-profile.calltrace.cycles-pp.error_entry.testcase 4.88 ± 3% +3.9 8.80 ± 2% perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 3.70 ± 3% +4.0 7.66 ± 2% perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase 46.92 ± 2% -24.6 22.36 ± 3% perf-profile.children.cycles-pp.do_user_addr_fault 47.10 ± 2% -24.4 22.72 ± 3% perf-profile.children.cycles-pp.exc_page_fault 16.11 ± 6% -16.1 0.00 perf-profile.children.cycles-pp.lock_mm_and_find_vma 14.41 ± 2% -14.1 0.31 ± 4% perf-profile.children.cycles-pp.down_read_trylock 57.26 -13.7 43.60 ± 2% perf-profile.children.cycles-pp.asm_exc_page_fault 7.06 -6.8 0.30 ± 3% perf-profile.children.cycles-pp.up_read 18.72 -6.2 12.56 ± 2% perf-profile.children.cycles-pp.__handle_mm_fault 20.36 -4.9 15.45 ± 2% perf-profile.children.cycles-pp.handle_mm_fault 67.14 -3.0 64.11 ± 2% perf-profile.children.cycles-pp.testcase 1.58 ± 5% -0.2 1.39 ± 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 1.43 ± 5% -0.2 1.27 ± 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.98 ± 5% -0.1 0.88 ± 5% perf-profile.children.cycles-pp.hrtimer_interrupt 1.00 ± 5% -0.1 0.90 ± 5% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.09 ± 21% -0.1 0.03 ±102% perf-profile.children.cycles-pp.intel_idle 0.20 ± 10% -0.0 0.16 ± 9% perf-profile.children.cycles-pp.__do_softirq 0.15 ± 7% -0.0 0.12 ± 6% perf-profile.children.cycles-pp.access_error 0.06 ± 13% +0.0 0.09 ± 7% perf-profile.children.cycles-pp.irqentry_enter 0.59 ± 2% +0.1 0.64 ± 4% perf-profile.children.cycles-pp.mtree_range_walk 0.04 ± 45% +0.1 0.10 ± 6% perf-profile.children.cycles-pp.perf_swevent_event 0.00 +0.1 0.05 ± 8% perf-profile.children.cycles-pp.__tlb_remove_page_size 0.06 ± 14% +0.1 0.12 ± 8% perf-profile.children.cycles-pp.folio_mapping 0.00 +0.1 0.06 ± 9% perf-profile.children.cycles-pp.llist_add_batch 0.00 +0.1 0.06 ± 9% perf-profile.children.cycles-pp.pte_mkwrite 0.00 +0.1 0.06 ± 13% perf-profile.children.cycles-pp.restore_regs_and_return_to_kernel 0.01 ±223% +0.1 0.08 ± 6% perf-profile.children.cycles-pp.vm_normal_page 0.15 ± 10% +0.1 0.21 ± 7% perf-profile.children.cycles-pp.__pte_offset_map 0.06 ± 7% +0.1 0.13 ± 5% perf-profile.children.cycles-pp.perf_exclude_event 0.09 ± 12% +0.1 0.16 ± 4% perf-profile.children.cycles-pp.error_return 0.00 +0.1 0.08 ± 8% perf-profile.children.cycles-pp.__cond_resched 0.09 ± 9% +0.1 0.18 ± 6% perf-profile.children.cycles-pp.free_swap_cache 0.08 ± 4% +0.1 0.17 ± 10% perf-profile.children.cycles-pp.xas_start 0.10 ± 26% +0.1 0.19 ± 19% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 0.09 ± 10% +0.1 0.18 ± 12% perf-profile.children.cycles-pp.__count_memcg_events 0.09 ± 6% +0.1 0.19 ± 3% perf-profile.children.cycles-pp.timestamp_truncate 0.10 ± 9% +0.1 0.21 ± 4% perf-profile.children.cycles-pp.free_pages_and_swap_cache 0.03 ±100% +0.1 0.14 ± 6% perf-profile.children.cycles-pp.native_flush_tlb_local 0.06 ± 19% +0.1 0.17 ± 6% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys 0.08 ± 13% +0.1 0.20 ± 16% perf-profile.children.cycles-pp.cgroup_rstat_updated 0.07 ± 10% +0.1 0.19 ± 4% perf-profile.children.cycles-pp.flush_tlb_func 0.14 ± 11% +0.2 0.29 ± 4% perf-profile.children.cycles-pp.release_pages 0.21 ± 2% +0.2 0.37 ± 2% perf-profile.children.cycles-pp._raw_spin_lock 0.15 ± 10% +0.2 0.31 ± 3% perf-profile.children.cycles-pp.folio_unlock 0.11 ± 10% +0.2 0.28 ± 5% perf-profile.children.cycles-pp.__flush_smp_call_function_queue 0.15 ± 5% +0.2 0.32 ± 5% perf-profile.children.cycles-pp._compound_head 0.01 ±223% +0.2 0.20 ± 5% perf-profile.children.cycles-pp.exit_to_user_mode_prepare 0.11 ± 9% +0.2 0.30 ± 3% perf-profile.children.cycles-pp.__sysvec_call_function 0.19 ± 3% +0.2 0.40 ± 3% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 0.13 ± 11% +0.2 0.34 ± 5% perf-profile.children.cycles-pp.flush_tlb_mm_range 0.12 ± 11% +0.2 0.34 ± 5% perf-profile.children.cycles-pp.smp_call_function_many_cond 0.12 ± 11% +0.2 0.34 ± 5% perf-profile.children.cycles-pp.on_each_cpu_cond_mask 0.22 ± 13% +0.2 0.44 ± 3% perf-profile.children.cycles-pp.folio_mark_dirty 0.14 ± 9% +0.2 0.38 ± 2% perf-profile.children.cycles-pp.sysvec_call_function 0.19 ± 19% +0.2 0.44 ± 12% perf-profile.children.cycles-pp.__mod_node_page_state 0.25 ± 6% +0.3 0.50 ± 4% perf-profile.children.cycles-pp.tlb_batch_pages_flush 0.29 ± 7% +0.3 0.55 ± 2% perf-profile.children.cycles-pp.xas_descend 0.28 ± 13% +0.3 0.57 ± 7% perf-profile.children.cycles-pp.inode_needs_update_time 0.26 ± 17% +0.3 0.58 ± 10% perf-profile.children.cycles-pp.__mod_lruvec_state 0.34 ± 10% +0.3 0.67 ± 5% perf-profile.children.cycles-pp.file_update_time 0.27 ± 10% +0.3 0.60 ± 3% perf-profile.children.cycles-pp.noop_dirty_folio 0.36 ± 2% +0.4 0.72 ± 3% perf-profile.children.cycles-pp.__pte_offset_map_lock 0.24 ± 6% +0.4 0.66 ± 4% perf-profile.children.cycles-pp.asm_sysvec_call_function 0.53 ± 5% +0.5 1.04 ± 3% perf-profile.children.cycles-pp.xas_load 0.47 ± 16% +0.6 1.08 ± 8% perf-profile.children.cycles-pp.folio_add_file_rmap_range 0.59 ± 4% +0.6 1.23 ± 2% perf-profile.children.cycles-pp.page_remove_rmap 0.60 ± 11% +0.7 1.30 ± 6% perf-profile.children.cycles-pp.__mod_lruvec_page_state 0.67 ± 4% +0.7 1.40 ± 2% perf-profile.children.cycles-pp.tlb_flush_rmaps 0.84 ± 3% +0.8 1.68 ± 2% perf-profile.children.cycles-pp.filemap_get_entry 0.84 ± 10% +1.0 1.84 ± 4% perf-profile.children.cycles-pp.fault_dirty_shared_page 0.94 ± 2% +1.1 2.00 ± 3% perf-profile.children.cycles-pp.___perf_sw_event 1.22 ± 2% +1.2 2.43 ± 2% perf-profile.children.cycles-pp.shmem_get_folio_gfp 1.33 ± 12% +1.2 2.54 ± 6% perf-profile.children.cycles-pp.set_pte_range 1.18 ± 4% +1.3 2.51 ± 2% perf-profile.children.cycles-pp.__perf_sw_event 1.48 ± 2% +1.4 2.90 ± 2% perf-profile.children.cycles-pp.shmem_fault 1.56 ± 3% +1.5 3.03 ± 2% perf-profile.children.cycles-pp.__do_fault 1.46 ± 2% +1.5 2.98 ± 2% perf-profile.children.cycles-pp.sync_regs 2.66 ± 7% +1.7 4.31 ± 8% perf-profile.children.cycles-pp.lock_vma_under_rcu 1.82 ± 8% +1.7 3.56 ± 4% perf-profile.children.cycles-pp.finish_fault 1.98 ± 2% +2.0 4.00 ± 2% perf-profile.children.cycles-pp.native_irq_return_iret 2.29 ± 5% +2.4 4.68 ± 3% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 2.28 ± 5% +2.4 4.68 ± 3% perf-profile.children.cycles-pp.do_syscall_64 2.15 ± 4% +2.4 4.55 ± 3% perf-profile.children.cycles-pp.unmap_vmas 2.15 ± 4% +2.4 4.55 ± 3% perf-profile.children.cycles-pp.unmap_page_range 2.15 ± 4% +2.4 4.55 ± 3% perf-profile.children.cycles-pp.zap_pmd_range 2.15 ± 4% +2.4 4.55 ± 3% perf-profile.children.cycles-pp.zap_pte_range 2.17 ± 4% +2.4 4.57 ± 3% perf-profile.children.cycles-pp.do_vmi_align_munmap 2.17 ± 4% +2.4 4.57 ± 3% perf-profile.children.cycles-pp.do_vmi_munmap 2.16 ± 5% +2.4 4.56 ± 3% perf-profile.children.cycles-pp.unmap_region 2.18 ± 4% +2.4 4.58 ± 3% perf-profile.children.cycles-pp.__vm_munmap 2.18 ± 5% +2.4 4.58 ± 3% perf-profile.children.cycles-pp.__x64_sys_munmap 2.18 ± 5% +2.4 4.58 ± 3% perf-profile.children.cycles-pp.__munmap 3.23 ± 2% +3.3 6.52 ± 2% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode 3.67 ± 3% +3.8 7.51 ± 2% perf-profile.children.cycles-pp.error_entry 4.93 ± 3% +4.0 8.89 ± 2% perf-profile.children.cycles-pp.do_fault 3.71 ± 3% +4.0 7.66 ± 2% perf-profile.children.cycles-pp.__irqentry_text_end 14.34 ± 2% -14.0 0.31 ± 4% perf-profile.self.cycles-pp.down_read_trylock 13.36 ± 3% -10.0 3.32 ± 5% perf-profile.self.cycles-pp.__handle_mm_fault 7.01 -6.7 0.30 ± 3% perf-profile.self.cycles-pp.up_read 0.09 ± 21% -0.1 0.03 ±102% perf-profile.self.cycles-pp.intel_idle 0.12 ± 5% -0.0 0.09 ± 9% perf-profile.self.cycles-pp.pte_offset_map_nolock 0.14 ± 4% -0.0 0.12 ± 7% perf-profile.self.cycles-pp.access_error 0.04 ± 44% +0.0 0.08 ± 8% perf-profile.self.cycles-pp.file_update_time 0.06 ± 13% +0.0 0.11 ± 6% perf-profile.self.cycles-pp.__count_memcg_events 0.59 ± 2% +0.0 0.63 ± 4% perf-profile.self.cycles-pp.mtree_range_walk 0.03 ±100% +0.1 0.08 ± 13% perf-profile.self.cycles-pp.__do_fault 0.04 ± 45% +0.1 0.10 ± 9% perf-profile.self.cycles-pp.perf_swevent_event 0.00 +0.1 0.05 ± 8% perf-profile.self.cycles-pp.__tlb_remove_page_size 0.00 +0.1 0.05 ± 8% perf-profile.self.cycles-pp.flush_tlb_func 0.05 ± 7% +0.1 0.11 ± 8% perf-profile.self.cycles-pp.perf_exclude_event 0.00 +0.1 0.06 ± 11% perf-profile.self.cycles-pp.restore_regs_and_return_to_kernel 0.00 +0.1 0.06 ± 15% perf-profile.self.cycles-pp.irqentry_enter 0.00 +0.1 0.06 ± 9% perf-profile.self.cycles-pp.llist_add_batch 0.00 +0.1 0.06 ± 9% perf-profile.self.cycles-pp.pte_mkwrite 0.05 ± 45% +0.1 0.11 ± 6% perf-profile.self.cycles-pp.folio_mapping 0.14 ± 10% +0.1 0.20 ± 8% perf-profile.self.cycles-pp.__pte_offset_map 0.06 ± 11% +0.1 0.13 ± 6% perf-profile.self.cycles-pp.error_return 0.00 +0.1 0.07 ± 10% perf-profile.self.cycles-pp.vm_normal_page 0.06 ± 11% +0.1 0.14 ± 8% perf-profile.self.cycles-pp.__mod_lruvec_state 0.08 ± 8% +0.1 0.16 ± 6% perf-profile.self.cycles-pp.free_swap_cache 0.00 +0.1 0.08 ± 8% perf-profile.self.cycles-pp.smp_call_function_many_cond 0.08 ± 6% +0.1 0.16 ± 11% perf-profile.self.cycles-pp.xas_start 0.10 ± 26% +0.1 0.18 ± 20% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 0.09 ± 9% +0.1 0.18 ± 5% perf-profile.self.cycles-pp.tlb_flush_rmaps 0.08 ± 5% +0.1 0.18 ± 3% perf-profile.self.cycles-pp.timestamp_truncate 0.10 ± 14% +0.1 0.20 ± 4% perf-profile.self.cycles-pp.inode_needs_update_time 0.06 ± 19% +0.1 0.17 ± 6% perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys 0.03 ±100% +0.1 0.14 ± 7% perf-profile.self.cycles-pp.native_flush_tlb_local 0.08 ± 12% +0.1 0.19 ± 17% perf-profile.self.cycles-pp.cgroup_rstat_updated 0.10 ± 4% +0.1 0.23 ± 7% perf-profile.self.cycles-pp._compound_head 0.13 ± 7% +0.1 0.26 ± 4% perf-profile.self.cycles-pp.exc_page_fault 0.13 ± 5% +0.1 0.27 ± 3% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 0.14 ± 10% +0.1 0.28 ± 5% perf-profile.self.cycles-pp.release_pages 0.14 ± 4% +0.1 0.29 ± 5% perf-profile.self.cycles-pp.__pte_offset_map_lock 0.12 ± 7% +0.1 0.27 ± 4% perf-profile.self.cycles-pp.finish_fault 0.14 ± 9% +0.2 0.29 ± 2% perf-profile.self.cycles-pp.folio_unlock 0.21 ± 2% +0.2 0.37 ± 3% perf-profile.self.cycles-pp._raw_spin_lock 0.16 ± 14% +0.2 0.32 ± 5% perf-profile.self.cycles-pp.folio_mark_dirty 0.17 ± 9% +0.2 0.34 ± 3% perf-profile.self.cycles-pp.xas_load 0.18 ± 8% +0.2 0.36 ± 5% perf-profile.self.cycles-pp.asm_exc_page_fault 0.15 ± 15% +0.2 0.33 ± 7% perf-profile.self.cycles-pp.__mod_lruvec_page_state 0.27 ± 11% +0.2 0.45 ± 8% perf-profile.self.cycles-pp.do_fault 0.00 +0.2 0.18 ± 5% perf-profile.self.cycles-pp.exit_to_user_mode_prepare 0.27 ± 3% +0.2 0.47 ± 6% perf-profile.self.cycles-pp.shmem_fault 0.16 ± 18% +0.2 0.38 ± 8% perf-profile.self.cycles-pp.fault_dirty_shared_page 0.18 ± 6% +0.2 0.40 ± 5% perf-profile.self.cycles-pp.folio_add_file_rmap_range 0.19 ± 20% +0.2 0.42 ± 14% perf-profile.self.cycles-pp.__mod_node_page_state 0.27 ± 7% +0.2 0.51 ± 3% perf-profile.self.cycles-pp.xas_descend 0.24 ± 12% +0.3 0.51 ± 9% perf-profile.self.cycles-pp.__perf_sw_event 0.34 ± 4% +0.3 0.61 ± 3% perf-profile.self.cycles-pp.do_user_addr_fault 0.25 ± 9% +0.3 0.58 ± 4% perf-profile.self.cycles-pp.noop_dirty_folio 0.28 ± 5% +0.3 0.60 ± 5% perf-profile.self.cycles-pp.set_pte_range 0.28 ± 5% +0.3 0.60 ± 4% perf-profile.self.cycles-pp.page_remove_rmap 0.34 ± 4% +0.3 0.66 ± 3% perf-profile.self.cycles-pp.shmem_get_folio_gfp 0.32 ± 3% +0.3 0.65 ± 3% perf-profile.self.cycles-pp.filemap_get_entry 0.66 ± 5% +0.7 1.39 ± 3% perf-profile.self.cycles-pp.zap_pte_range 0.82 ± 3% +0.9 1.76 ± 3% perf-profile.self.cycles-pp.___perf_sw_event 1.46 ± 2% +1.5 2.98 ± 2% perf-profile.self.cycles-pp.sync_regs 1.97 ± 2% +2.0 3.99 ± 2% perf-profile.self.cycles-pp.native_irq_return_iret 3.20 ± 2% +3.1 6.33 ± 2% perf-profile.self.cycles-pp.irqentry_exit_to_user_mode 3.65 ± 3% +3.8 7.47 ± 2% perf-profile.self.cycles-pp.error_entry 3.71 ± 3% +4.0 7.66 ± 2% perf-profile.self.cycles-pp.__irqentry_text_end 5.82 ± 2% +6.4 12.22 ± 2% perf-profile.self.cycles-pp.testcase *************************************************************************************************** lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/process/16/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/page_fault3/will-it-scale commit: 164b06f238 ("mm: call wp_page_copy() under the VMA lock") 4ed4379881 ("mm: handle shared faults under the VMA lock") 164b06f238b98631 4ed4379881aa62588aba6442a9f ---------------- --------------------------- %stddev %change %stddev \ | \ 3.93 ± 9% +0.6 4.48 ± 3% mpstat.cpu.all.usr% 69443 ± 37% -43.5% 39209 ± 52% numa-numastat.node0.other_node 69443 ± 37% -43.5% 39209 ± 52% numa-vmstat.node0.numa_other 815.58 ± 4% -13.9% 702.42 ± 6% sched_debug.cpu.nr_switches.min 0.17 -11.8% 0.15 turbostat.IPC 7829312 +7.8% 8442054 will-it-scale.16.processes 489331 +7.8% 527628 will-it-scale.per_process_ops 7829312 +7.8% 8442054 will-it-scale.workload 6054588 +5.6% 6393936 proc-vmstat.numa_hit 5946949 +5.7% 6286318 proc-vmstat.numa_local 6138630 +5.6% 6479594 proc-vmstat.pgalloc_normal 2.356e+09 +7.8% 2.541e+09 proc-vmstat.pgfault 6094218 +5.6% 6435293 proc-vmstat.pgfree 33370577 ± 5% +8.1% 36086535 ± 2% perf-stat.i.branch-misses 13591855 ± 6% +9.3% 14849901 ± 3% perf-stat.i.cache-misses 87506837 +2.6% 89773400 perf-stat.i.cache-references 9248232 ± 6% +10.3% 10201074 ± 3% perf-stat.i.dTLB-load-misses 4.89 ± 8% +1.4 6.25 ± 3% perf-stat.i.dTLB-store-miss-rate% 2.039e+08 ± 9% +14.0% 2.324e+08 ± 3% perf-stat.i.dTLB-store-misses 2165 ± 5% -18.8% 1758 ± 7% perf-stat.i.instructions-per-iTLB-miss 7100967 ± 9% +13.9% 8087600 ± 3% perf-stat.i.minor-faults 7137308 ± 9% +13.9% 8128127 ± 3% perf-stat.i.node-stores 7100971 ± 9% +13.9% 8087600 ± 3% perf-stat.i.page-faults 0.51 ± 3% +21.1% 0.62 perf-stat.overall.MPKI 0.60 ± 4% +0.1 0.72 perf-stat.overall.branch-miss-rate% 1.64 +16.4% 1.91 perf-stat.overall.cpi 0.14 ± 3% +0.0 0.17 perf-stat.overall.dTLB-load-miss-rate% 5.33 +1.2 6.49 perf-stat.overall.dTLB-store-miss-rate% 2276 ± 5% -21.9% 1779 ± 6% perf-stat.overall.instructions-per-iTLB-miss 0.61 -14.1% 0.52 perf-stat.overall.ipc 1126412 -20.9% 890493 perf-stat.overall.path-length 33278957 ± 4% +8.1% 35982000 ± 2% perf-stat.ps.branch-misses 13557468 ± 6% +9.2% 14806522 ± 3% perf-stat.ps.cache-misses 87230707 +2.6% 89485467 perf-stat.ps.cache-references 9225940 ± 6% +10.3% 10171834 ± 3% perf-stat.ps.dTLB-load-misses 2.035e+08 ± 9% +13.9% 2.318e+08 ± 3% perf-stat.ps.dTLB-store-misses 7085678 ± 9% +13.8% 8065196 ± 3% perf-stat.ps.minor-faults 7121860 ± 9% +13.8% 8105529 ± 3% perf-stat.ps.node-stores 7085683 ± 9% +13.8% 8065196 ± 3% perf-stat.ps.page-faults 8.819e+12 -14.8% 7.518e+12 perf-stat.total.instructions 23.48 ± 2% -3.4 20.08 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase 22.94 ± 2% -3.3 19.65 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 16.46 ± 2% -1.6 14.84 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 12.98 ± 2% -1.1 11.92 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.58 ± 2% +0.0 0.62 perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.tlb_flush_rmaps.zap_pte_range.zap_pmd_range 0.59 ± 6% +0.1 0.64 ± 3% perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_entry.shmem_get_folio_gfp.shmem_fault 0.53 ± 4% +0.1 0.59 ± 3% perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.fault_dirty_shared_page.do_fault.__handle_mm_fault 1.08 ± 2% +0.1 1.14 ± 3% perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.64 ± 4% +0.1 0.71 ± 2% perf-profile.calltrace.cycles-pp.file_update_time.fault_dirty_shared_page.do_fault.__handle_mm_fault.handle_mm_fault 1.05 ± 4% +0.1 1.12 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault 0.76 ± 4% +0.1 0.86 ± 3% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.folio_add_file_rmap_range.set_pte_range.finish_fault.do_fault 0.92 ± 2% +0.1 1.01 perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault 1.34 ± 3% +0.1 1.43 ± 3% perf-profile.calltrace.cycles-pp.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.71 ± 2% +0.1 0.81 ± 3% perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 0.99 ± 3% +0.1 1.09 perf-profile.calltrace.cycles-pp.mas_walk.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 1.22 ± 2% +0.1 1.34 perf-profile.calltrace.cycles-pp.page_remove_rmap.tlb_flush_rmaps.zap_pte_range.zap_pmd_range.unmap_page_range 1.24 ± 2% +0.1 1.36 perf-profile.calltrace.cycles-pp.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 1.40 ± 2% +0.1 1.53 perf-profile.calltrace.cycles-pp.tlb_flush_rmaps.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 1.17 ± 4% +0.1 1.30 ± 3% perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.finish_fault.do_fault.__handle_mm_fault 1.76 ± 3% +0.1 1.89 perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault 1.93 ± 3% +0.2 2.10 perf-profile.calltrace.cycles-pp.fault_dirty_shared_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 2.52 ± 2% +0.2 2.72 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault 0.34 ± 70% +0.2 0.55 ± 3% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 2.30 ± 2% +0.2 2.51 ± 2% perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 1.86 ± 3% +0.2 2.08 perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 3.02 ± 2% +0.2 3.26 perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase 3.31 ± 2% +0.2 3.56 perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 3.14 ± 2% +0.2 3.40 perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault 3.32 +0.3 3.62 ± 2% perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 4.22 ± 2% +0.4 4.60 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 9.50 ± 2% +0.4 9.88 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 4.32 ± 2% +0.4 4.71 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 4.30 ± 2% +0.4 4.70 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 4.30 ± 2% +0.4 4.70 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap 4.30 ± 2% +0.4 4.70 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap 4.32 ± 2% +0.4 4.71 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 4.32 ± 2% +0.4 4.71 perf-profile.calltrace.cycles-pp.__munmap 4.32 ± 2% +0.4 4.71 perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap 4.32 ± 2% +0.4 4.71 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 4.32 ± 2% +0.4 4.71 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 4.32 ± 2% +0.4 4.71 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 4.32 ± 2% +0.4 4.71 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 6.61 ± 2% +0.6 7.21 ± 2% perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_page_fault.testcase 7.67 ± 2% +0.7 8.39 perf-profile.calltrace.cycles-pp.error_entry.testcase 7.80 ± 2% +0.8 8.57 perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase 23.14 ± 2% -3.4 19.72 perf-profile.children.cycles-pp.do_user_addr_fault 23.52 ± 2% -3.4 20.12 perf-profile.children.cycles-pp.exc_page_fault 16.58 ± 2% -1.6 14.99 perf-profile.children.cycles-pp.handle_mm_fault 13.06 ± 2% -1.1 11.97 perf-profile.children.cycles-pp.__handle_mm_fault 1.38 ± 5% -0.6 0.75 perf-profile.children.cycles-pp.mtree_range_walk 1.05 ± 5% -0.6 0.45 ± 10% perf-profile.children.cycles-pp.handle_pte_fault 0.58 ± 5% -0.3 0.25 ± 6% perf-profile.children.cycles-pp.pte_offset_map_nolock 0.45 ± 3% -0.3 0.14 ± 7% perf-profile.children.cycles-pp.access_error 0.65 ± 2% -0.3 0.36 ± 4% perf-profile.children.cycles-pp.down_read_trylock 0.62 ± 5% -0.3 0.34 ± 4% perf-profile.children.cycles-pp.up_read 0.33 ± 4% -0.1 0.24 ± 4% perf-profile.children.cycles-pp.__pte_offset_map 0.20 ± 4% +0.0 0.22 ± 6% perf-profile.children.cycles-pp.exit_to_user_mode_prepare 0.42 ± 3% +0.0 0.45 ± 4% perf-profile.children.cycles-pp.percpu_counter_add_batch 0.16 ± 4% +0.0 0.18 ± 6% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 0.30 ± 2% +0.0 0.33 ± 2% perf-profile.children.cycles-pp.release_pages 0.38 ± 5% +0.0 0.42 ± 5% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 0.50 ± 3% +0.0 0.54 ± 3% perf-profile.children.cycles-pp.folio_mark_dirty 0.51 ± 3% +0.0 0.56 ± 3% perf-profile.children.cycles-pp.tlb_batch_pages_flush 0.35 ± 4% +0.1 0.40 ± 4% perf-profile.children.cycles-pp._raw_spin_lock 0.60 ± 6% +0.1 0.66 ± 2% perf-profile.children.cycles-pp.xas_descend 0.55 ± 3% +0.1 0.60 ± 3% perf-profile.children.cycles-pp.inode_needs_update_time 0.54 ± 4% +0.1 0.60 ± 3% perf-profile.children.cycles-pp.__mod_node_page_state 0.65 ± 4% +0.1 0.72 ± 2% perf-profile.children.cycles-pp.file_update_time 1.10 ± 4% +0.1 1.17 perf-profile.children.cycles-pp.xas_load 0.69 ± 2% +0.1 0.77 ± 2% perf-profile.children.cycles-pp.__mod_lruvec_state 1.00 ± 3% +0.1 1.09 perf-profile.children.cycles-pp.mas_walk 0.73 ± 2% +0.1 0.83 ± 3% perf-profile.children.cycles-pp.__pte_offset_map_lock 1.24 ± 2% +0.1 1.36 perf-profile.children.cycles-pp.page_remove_rmap 1.41 ± 2% +0.1 1.54 perf-profile.children.cycles-pp.tlb_flush_rmaps 1.18 ± 4% +0.1 1.31 ± 3% perf-profile.children.cycles-pp.folio_add_file_rmap_range 1.77 ± 3% +0.1 1.90 perf-profile.children.cycles-pp.filemap_get_entry 1.43 ± 3% +0.1 1.57 ± 2% perf-profile.children.cycles-pp.__mod_lruvec_page_state 2.03 +0.2 2.19 perf-profile.children.cycles-pp.___perf_sw_event 1.97 ± 3% +0.2 2.14 perf-profile.children.cycles-pp.fault_dirty_shared_page 2.54 ± 2% +0.2 2.74 perf-profile.children.cycles-pp.shmem_get_folio_gfp 2.31 ± 2% +0.2 2.53 ± 2% perf-profile.children.cycles-pp.set_pte_range 2.61 ± 2% +0.2 2.83 perf-profile.children.cycles-pp.__perf_sw_event 1.86 ± 3% +0.2 2.09 perf-profile.children.cycles-pp.lock_vma_under_rcu 3.33 ± 2% +0.2 3.57 perf-profile.children.cycles-pp.__do_fault 3.10 ± 2% +0.3 3.36 ± 2% perf-profile.children.cycles-pp.sync_regs 3.16 ± 2% +0.3 3.41 perf-profile.children.cycles-pp.shmem_fault 3.39 +0.3 3.70 ± 2% perf-profile.children.cycles-pp.finish_fault 9.64 ± 2% +0.4 9.99 perf-profile.children.cycles-pp.do_fault 4.32 ± 2% +0.4 4.71 perf-profile.children.cycles-pp.__munmap 4.32 ± 2% +0.4 4.71 perf-profile.children.cycles-pp.do_vmi_munmap 4.32 ± 2% +0.4 4.71 perf-profile.children.cycles-pp.do_vmi_align_munmap 4.30 ± 2% +0.4 4.70 perf-profile.children.cycles-pp.unmap_vmas 4.30 ± 2% +0.4 4.70 perf-profile.children.cycles-pp.unmap_page_range 4.30 ± 2% +0.4 4.70 perf-profile.children.cycles-pp.zap_pmd_range 4.30 ± 2% +0.4 4.70 perf-profile.children.cycles-pp.zap_pte_range 4.40 ± 2% +0.4 4.80 perf-profile.children.cycles-pp.do_syscall_64 4.32 ± 2% +0.4 4.71 perf-profile.children.cycles-pp.__vm_munmap 4.32 ± 2% +0.4 4.71 perf-profile.children.cycles-pp.__x64_sys_munmap 4.32 ± 2% +0.4 4.71 perf-profile.children.cycles-pp.unmap_region 4.41 ± 2% +0.4 4.80 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 4.07 +0.4 4.47 perf-profile.children.cycles-pp.native_irq_return_iret 6.72 ± 2% +0.6 7.32 ± 2% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode 7.75 ± 2% +0.7 8.48 perf-profile.children.cycles-pp.error_entry 7.80 ± 2% +0.8 8.57 perf-profile.children.cycles-pp.__irqentry_text_end 2.35 ± 2% -0.8 1.51 ± 3% perf-profile.self.cycles-pp.__handle_mm_fault 1.37 ± 5% -0.6 0.75 perf-profile.self.cycles-pp.mtree_range_walk 1.93 ± 5% -0.6 1.36 ± 5% perf-profile.self.cycles-pp.handle_mm_fault 0.63 ± 2% -0.3 0.36 ± 3% perf-profile.self.cycles-pp.down_read_trylock 0.47 ± 6% -0.3 0.20 ± 17% perf-profile.self.cycles-pp.handle_pte_fault 0.90 ± 3% -0.3 0.64 ± 3% perf-profile.self.cycles-pp.do_user_addr_fault 0.35 ± 7% -0.2 0.12 ± 11% perf-profile.self.cycles-pp.pte_offset_map_nolock 0.57 ± 5% -0.2 0.34 ± 4% perf-profile.self.cycles-pp.up_read 0.37 ± 4% -0.2 0.14 ± 7% perf-profile.self.cycles-pp.access_error 0.70 ± 3% -0.1 0.58 ± 3% perf-profile.self.cycles-pp.do_fault 0.31 ± 4% -0.1 0.23 ± 4% perf-profile.self.cycles-pp.__pte_offset_map 0.29 +0.0 0.32 perf-profile.self.cycles-pp.release_pages 0.16 ± 4% +0.0 0.18 ± 6% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 0.26 ± 4% +0.0 0.30 ± 4% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 0.40 ± 4% +0.0 0.44 ± 4% perf-profile.self.cycles-pp.folio_add_file_rmap_range 0.29 ± 3% +0.0 0.33 ± 5% perf-profile.self.cycles-pp.__pte_offset_map_lock 0.42 ± 4% +0.0 0.47 ± 4% perf-profile.self.cycles-pp.fault_dirty_shared_page 0.35 ± 3% +0.1 0.40 ± 3% perf-profile.self.cycles-pp._raw_spin_lock 0.60 ± 3% +0.1 0.66 ± 2% perf-profile.self.cycles-pp.set_pte_range 0.69 ± 2% +0.1 0.75 ± 2% perf-profile.self.cycles-pp.shmem_get_folio_gfp 0.52 ± 3% +0.1 0.58 ± 3% perf-profile.self.cycles-pp.__mod_node_page_state 0.59 ± 5% +0.1 0.66 ± 2% perf-profile.self.cycles-pp.page_remove_rmap 0.54 ± 5% +0.1 0.63 ± 2% perf-profile.self.cycles-pp.lock_vma_under_rcu 1.43 +0.1 1.54 ± 2% perf-profile.self.cycles-pp.zap_pte_range 1.80 +0.1 1.94 perf-profile.self.cycles-pp.___perf_sw_event 3.10 ± 2% +0.3 3.36 ± 2% perf-profile.self.cycles-pp.sync_regs 4.06 +0.4 4.46 perf-profile.self.cycles-pp.native_irq_return_iret 6.52 ± 2% +0.6 7.10 ± 2% perf-profile.self.cycles-pp.irqentry_exit_to_user_mode 7.72 ± 2% +0.7 8.44 perf-profile.self.cycles-pp.error_entry 7.80 ± 2% +0.8 8.57 perf-profile.self.cycles-pp.__irqentry_text_end 12.39 +1.1 13.50 perf-profile.self.cycles-pp.testcase Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki