[linux-next:master] [mm] cc8cb3697a: stress-ng.pkey.ops_per_sec 4.4% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 4.4% improvement of stress-ng.pkey.ops_per_sec on:


commit: cc8cb3697a8d8eabe1fb9acb8768b11c1ab607d8 ("mm: refactor vma_merge() into modify-only vma_merge_existing_range()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: pkey
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240916/202409161559.af0a1b99-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/pkey/stress-ng/60s

commit: 
  65e0aa64df ("mm: introduce commit_merge(), abstracting final commit of merge")
  cc8cb3697a ("mm: refactor vma_merge() into modify-only vma_merge_existing_range()")

65e0aa64df916861 cc8cb3697a8d8eabe1fb9acb876 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    159916 ±  5%     +14.9%     183809 ± 10%  meminfo.DirectMap4k
     15.42 ± 23%     +46.5%      22.58 ± 17%  sched_debug.cpu.nr_uninterruptible.max
 2.158e+08            +4.4%  2.253e+08        stress-ng.pkey.ops
   3596484            +4.4%    3755565        stress-ng.pkey.ops_per_sec
    196.30            +4.9%     205.86        stress-ng.time.user_time
  25782400            +3.4%   26666903        proc-vmstat.numa_hit
  25707363            +3.5%   26600006        proc-vmstat.numa_local
  44223158            +3.4%   45721027        proc-vmstat.pgalloc_normal
  39763569            +3.5%   41151044        proc-vmstat.pgfree
 3.568e+10            +1.4%  3.619e+10        perf-stat.i.branch-instructions
  87058419 ±  2%      +3.1%   89795461        perf-stat.i.branch-misses
 1.482e+08            +2.7%  1.521e+08        perf-stat.i.cache-references
      1854            -2.2%       1813        perf-stat.i.cycles-between-cache-misses
  1.68e+11            +1.1%  1.699e+11        perf-stat.i.instructions
      0.64            +1.8%       0.65        perf-stat.overall.MPKI
      1812            -2.5%       1766        perf-stat.overall.cycles-between-cache-misses
 1.045e+08            +2.6%  1.073e+08        perf-stat.ps.cache-misses
 1.446e+08            +2.8%  1.486e+08        perf-stat.ps.cache-references
     25.66 ±116%     -96.6%       0.86 ±168%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
      9.35 ± 40%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
      9.63 ± 36%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_merge.constprop.0
      3.87 ± 38%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
     10.81 ± 36%     -76.2%       2.57 ±142%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
      3.74 ± 55%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm
      2.32 ± 34%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
      1.32 ±104%     -80.1%       0.26 ±221%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     19.81 ±188%     -99.3%       0.14 ±142%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.39 ± 57%     -81.6%       0.07 ±153%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
    180.87 ±203%     -99.1%       1.55 ±153%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.36 ±108%     -96.5%       0.01 ±187%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      1.44 ± 19%     -85.8%       0.20 ±171%  perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     40.94 ±115%     -99.8%       0.10 ±143%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    112.73 ±118%     -98.9%       1.19 ±142%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
    335.83 ± 29%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
    301.19 ± 35%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_merge.constprop.0
     22.34 ± 98%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
    473.14 ± 21%     -76.2%     112.54 ±144%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
      6.84 ± 51%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm
      7.07 ± 72%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
      0.42 ±147%     -98.1%       0.01 ±141%  perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      2373 ± 40%     -78.6%     507.50 ±152%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      1.70 ±111%     -96.7%       0.06 ±212%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      1309 ± 77%     -99.6%       5.06 ±165%  perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      2745 ± 25%     -81.5%     507.97 ±152%  perf-sched.total_sch_delay.max.ms
     10044 ±  4%     -74.3%       2576 ±141%  perf-sched.total_wait_and_delay.count.ms
      6234 ± 21%     -77.2%       1421 ±141%  perf-sched.total_wait_and_delay.max.ms
     18.71 ± 40%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
     19.26 ± 36%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.vma_merge.constprop.0
     21.62 ± 36%     -76.2%       5.15 ±142%  perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
    885.96 ± 42%     -79.1%     185.28 ±142%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
    144.50 ± 24%     -86.4%      19.67 ±145%  perf-sched.wait_and_delay.count.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
    131.83 ±  9%     -73.7%      34.67 ±141%  perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    159.83 ±  8%     -75.8%      38.67 ±144%  perf-sched.wait_and_delay.count.__cond_resched.change_pmd_range.isra.0.change_pud_range
    227.83 ±  9%     -77.0%      52.33 ±141%  perf-sched.wait_and_delay.count.__cond_resched.change_pud_range.isra.0.change_protection_range
     75.00 ±  8%     -71.6%      21.33 ±143%  perf-sched.wait_and_delay.count.__cond_resched.down_write.__x64_sys_pkey_free.do_syscall_64.entry_SYSCALL_64_after_hwframe
     82.00 ±  9%     -76.4%      19.33 ±141%  perf-sched.wait_and_delay.count.__cond_resched.down_write.anon_vma_clone.__split_vma.vma_modify
    412.67 ±  7%     -82.6%      71.83 ±141%  perf-sched.wait_and_delay.count.__cond_resched.down_write.mprotect_fixup.do_mprotect_pkey.__x64_sys_pkey_mprotect
    125.83 ±  9%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
     86.83 ± 14%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.down_write.vma_merge.constprop.0
    225.33 ±  7%     -76.1%      53.83 ±142%  perf-sched.wait_and_delay.count.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
    314.67 ± 31%     -87.1%      40.50 ±142%  perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
    118.17 ± 12%     -80.0%      23.67 ±143%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma
    206.50 ±  8%     -77.2%      47.17 ±141%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vma_modify
     76.33 ± 23%     -90.6%       7.17 ±223%  perf-sched.wait_and_delay.count.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
     45.33 ± 21%     -83.1%       7.67 ±148%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    626.00 ± 66%     -92.6%      46.17 ±142%  perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
     10.33 ± 14%     -83.9%       1.67 ±223%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
     54.17 ± 27%     -70.2%      16.17 ±141%  perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      1976 ±  7%     -77.3%     447.67 ±141%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1760 ± 10%     -74.8%     443.33 ±147%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    195.50 ±  9%     -75.9%      47.17 ±141%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    671.66 ± 29%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
    602.38 ± 35%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.vma_merge.constprop.0
    946.28 ± 21%     -76.2%     225.08 ±144%  perf-sched.wait_and_delay.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
      4225 ± 39%     -75.8%       1022 ±141%  perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      2837 ± 31%     -88.2%     334.64 ±223%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
      4535 ± 33%     -74.1%       1173 ±143%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     25.66 ±116%     -96.6%       0.86 ±168%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
      9.36 ± 40%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
      9.63 ± 36%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_merge.constprop.0
      3.87 ± 38%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
     10.81 ± 36%     -76.2%       2.57 ±142%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
      2.32 ± 34%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
    286.97 ±115%     -99.8%       0.71 ±182%  perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.39 ± 57%     -81.6%       0.07 ±153%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
    705.09 ± 56%     -73.9%     183.73 ±142%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
    112.73 ±118%     -98.9%       1.19 ±142%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
    335.83 ± 29%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
    301.19 ± 35%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_merge.constprop.0
     22.34 ± 98%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
    473.14 ± 21%     -76.2%     112.54 ±144%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
      7.07 ± 72%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
    835.83 ±107%     -99.8%       1.31 ±200%  perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      1536 ± 83%     -77.9%     339.71 ±141%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      2836 ± 31%     -88.2%     334.51 ±223%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux