Hello, kernel test robot noticed a 4.4% improvement of stress-ng.pkey.ops_per_sec on: commit: cc8cb3697a8d8eabe1fb9acb8768b11c1ab607d8 ("mm: refactor vma_merge() into modify-only vma_merge_existing_range()") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master testcase: stress-ng test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: nr_threads: 100% testtime: 60s test: pkey cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240916/202409161559.af0a1b99-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/pkey/stress-ng/60s commit: 65e0aa64df ("mm: introduce commit_merge(), abstracting final commit of merge") cc8cb3697a ("mm: refactor vma_merge() into modify-only vma_merge_existing_range()") 65e0aa64df916861 cc8cb3697a8d8eabe1fb9acb876 ---------------- --------------------------- %stddev %change %stddev \ | \ 159916 ± 5% +14.9% 183809 ± 10% meminfo.DirectMap4k 15.42 ± 23% +46.5% 22.58 ± 17% sched_debug.cpu.nr_uninterruptible.max 2.158e+08 +4.4% 2.253e+08 stress-ng.pkey.ops 3596484 +4.4% 3755565 stress-ng.pkey.ops_per_sec 196.30 +4.9% 205.86 stress-ng.time.user_time 25782400 +3.4% 26666903 proc-vmstat.numa_hit 25707363 +3.5% 26600006 proc-vmstat.numa_local 44223158 +3.4% 45721027 proc-vmstat.pgalloc_normal 39763569 +3.5% 41151044 proc-vmstat.pgfree 3.568e+10 +1.4% 3.619e+10 perf-stat.i.branch-instructions 87058419 ± 2% +3.1% 89795461 perf-stat.i.branch-misses 1.482e+08 +2.7% 1.521e+08 perf-stat.i.cache-references 1854 -2.2% 1813 perf-stat.i.cycles-between-cache-misses 1.68e+11 +1.1% 1.699e+11 perf-stat.i.instructions 0.64 +1.8% 0.65 perf-stat.overall.MPKI 1812 -2.5% 1766 perf-stat.overall.cycles-between-cache-misses 1.045e+08 +2.6% 1.073e+08 perf-stat.ps.cache-misses 1.446e+08 +2.8% 1.486e+08 perf-stat.ps.cache-references 25.66 ±116% -96.6% 0.86 ±168% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc 9.35 ± 40% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge 9.63 ± 36% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_merge.constprop.0 3.87 ± 38% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop 10.81 ± 36% -76.2% 2.57 ±142% perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64 3.74 ± 55% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm 2.32 ± 34% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge 1.32 ±104% -80.1% 0.26 ±221% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 19.81 ±188% -99.3% 0.14 ±142% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 0.39 ± 57% -81.6% 0.07 ±153% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown] 180.87 ±203% -99.1% 1.55 ±153% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 0.36 ±108% -96.5% 0.01 ±187% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork 1.44 ± 19% -85.8% 0.20 ±171% perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open 40.94 ±115% -99.8% 0.10 ±143% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 112.73 ±118% -98.9% 1.19 ±142% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc 335.83 ± 29% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge 301.19 ± 35% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_merge.constprop.0 22.34 ± 98% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop 473.14 ± 21% -76.2% 112.54 ±144% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64 6.84 ± 51% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm 7.07 ± 72% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge 0.42 ±147% -98.1% 0.01 ±141% perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm 2373 ± 40% -78.6% 507.50 ±152% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64 1.70 ±111% -96.7% 0.06 ±212% perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork 1309 ± 77% -99.6% 5.06 ±165% perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 2745 ± 25% -81.5% 507.97 ±152% perf-sched.total_sch_delay.max.ms 10044 ± 4% -74.3% 2576 ±141% perf-sched.total_wait_and_delay.count.ms 6234 ± 21% -77.2% 1421 ±141% perf-sched.total_wait_and_delay.max.ms 18.71 ± 40% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge 19.26 ± 36% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.vma_merge.constprop.0 21.62 ± 36% -76.2% 5.15 ±142% perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64 885.96 ± 42% -79.1% 185.28 ±142% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 144.50 ± 24% -86.4% 19.67 ±145% perf-sched.wait_and_delay.count.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio 131.83 ± 9% -73.7% 34.67 ±141% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity 159.83 ± 8% -75.8% 38.67 ±144% perf-sched.wait_and_delay.count.__cond_resched.change_pmd_range.isra.0.change_pud_range 227.83 ± 9% -77.0% 52.33 ±141% perf-sched.wait_and_delay.count.__cond_resched.change_pud_range.isra.0.change_protection_range 75.00 ± 8% -71.6% 21.33 ±143% perf-sched.wait_and_delay.count.__cond_resched.down_write.__x64_sys_pkey_free.do_syscall_64.entry_SYSCALL_64_after_hwframe 82.00 ± 9% -76.4% 19.33 ±141% perf-sched.wait_and_delay.count.__cond_resched.down_write.anon_vma_clone.__split_vma.vma_modify 412.67 ± 7% -82.6% 71.83 ±141% perf-sched.wait_and_delay.count.__cond_resched.down_write.mprotect_fixup.do_mprotect_pkey.__x64_sys_pkey_mprotect 125.83 ± 9% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge 86.83 ± 14% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.down_write.vma_merge.constprop.0 225.33 ± 7% -76.1% 53.83 ±142% perf-sched.wait_and_delay.count.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64 314.67 ± 31% -87.1% 40.50 ±142% perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write 118.17 ± 12% -80.0% 23.67 ±143% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma 206.50 ± 8% -77.2% 47.17 ±141% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vma_modify 76.33 ± 23% -90.6% 7.17 ±223% perf-sched.wait_and_delay.count.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin 45.33 ± 21% -83.1% 7.67 ±148% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 626.00 ± 66% -92.6% 46.17 ±142% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown] 10.33 ± 14% -83.9% 1.67 ±223% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 54.17 ± 27% -70.2% 16.17 ±141% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 1976 ± 7% -77.3% 447.67 ±141% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 1760 ± 10% -74.8% 443.33 ±147% perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 195.50 ± 9% -75.9% 47.17 ±141% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 671.66 ± 29% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge 602.38 ± 35% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.vma_merge.constprop.0 946.28 ± 21% -76.2% 225.08 ±144% perf-sched.wait_and_delay.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64 4225 ± 39% -75.8% 1022 ±141% perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64 2837 ± 31% -88.2% 334.64 ±223% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 4535 ± 33% -74.1% 1173 ±143% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 25.66 ±116% -96.6% 0.86 ±168% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc 9.36 ± 40% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge 9.63 ± 36% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_merge.constprop.0 3.87 ± 38% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop 10.81 ± 36% -76.2% 2.57 ±142% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64 2.32 ± 34% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge 286.97 ±115% -99.8% 0.71 ±182% perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 0.39 ± 57% -81.6% 0.07 ±153% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown] 705.09 ± 56% -73.9% 183.73 ±142% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 112.73 ±118% -98.9% 1.19 ±142% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc 335.83 ± 29% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge 301.19 ± 35% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_merge.constprop.0 22.34 ± 98% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop 473.14 ± 21% -76.2% 112.54 ±144% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64 7.07 ± 72% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge 835.83 ±107% -99.8% 1.31 ±200% perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 1536 ± 83% -77.9% 339.71 ±141% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 2836 ± 31% -88.2% 334.51 ±223% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki