Re: [PATCH] [v2] filemap: Move prefaulting out of hot write path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 3.6% improvement of will-it-scale.per_thread_ops on:


commit: 391ab5826c820c58d180534a7a727ff5668d4d61 ("[PATCH] [v2] filemap: Move prefaulting out of hot write path")
url: https://github.com/intel-lab-lkp/linux/commits/Dave-Hansen/filemap-Move-prefaulting-out-of-hot-write-path/20250301-043921
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/all/20250228203722.CAEB63AC@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
patch subject: [PATCH] [v2] filemap: Move prefaulting out of hot write path

testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:

	nr_task: 100%
	mode: thread
	test: pwrite1
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250310/202503101621.e0858506-lkp@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/pwrite1/will-it-scale

commit: 
  3dec9c0e67 ("foo")
  391ab5826c ("filemap: Move prefaulting out of hot write path")

3dec9c0e67aaf496 391ab5826c820c58d180534a7a7 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    182266 ±  3%      +9.4%     199333 ±  4%  meminfo.DirectMap4k
    765.67 ±  8%     -22.1%     596.83 ±  9%  perf-sched.wait_and_delay.count.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
     17510 ±  6%     +19.3%      20889 ±  8%  sched_debug.cpu.nr_switches.max
      3219 ±  5%     +11.2%       3578 ±  2%  sched_debug.cpu.nr_switches.stddev
  54561715            +3.6%   56543708        will-it-scale.104.threads
    524631            +3.6%     543689        will-it-scale.per_thread_ops
  54561715            +3.6%   56543708        will-it-scale.workload
 1.752e+10            -1.2%  1.731e+10        perf-stat.i.branch-instructions
      1.59            +0.0        1.63        perf-stat.i.branch-miss-rate%
      3.25            +1.8%       3.31        perf-stat.i.cpi
 8.828e+10            -1.5%  8.699e+10        perf-stat.i.instructions
      0.31            -1.8%       0.30        perf-stat.i.ipc
      1.58            +0.0        1.62        perf-stat.overall.branch-miss-rate%
      3.25            +1.8%       3.31        perf-stat.overall.cpi
      0.31            -1.7%       0.30        perf-stat.overall.ipc
    487316            -4.8%     464012        perf-stat.overall.path-length
 1.746e+10            -1.2%  1.725e+10        perf-stat.ps.branch-instructions
 8.798e+10            -1.5%   8.67e+10        perf-stat.ps.instructions
 2.659e+13            -1.3%  2.624e+13        perf-stat.total.instructions
     34.45            -5.9       28.57        perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
     48.17            -5.3       42.87 ±  2%  perf-profile.calltrace.cycles-pp.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
     42.56            -5.3       37.29 ±  2%  perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe
     51.38            -4.9       46.46        perf-profile.calltrace.cycles-pp.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
     54.26            -4.5       49.76        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
     62.52            -3.9       58.65        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_pwrite
     13.30 ±  2%      -2.3       10.96        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
     10.17 ±  2%      -2.0        8.18        perf-profile.calltrace.cycles-pp.rep_movs_alternative.copy_page_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write
      0.59 ±  4%      -0.3        0.26 ±100%  perf-profile.calltrace.cycles-pp.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
      0.88 ±  3%      -0.1        0.74 ±  4%  perf-profile.calltrace.cycles-pp.folio_mark_accessed.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
      0.67 ±  2%      -0.1        0.56        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
     99.51            +0.1       99.56        perf-profile.calltrace.cycles-pp.__libc_pwrite
      0.64 ±  5%      +0.1        0.71 ±  2%  perf-profile.calltrace.cycles-pp.fput.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
      0.93 ±  2%      +0.1        1.02        perf-profile.calltrace.cycles-pp.folio_unlock.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
      0.72            +0.1        0.82 ±  2%  perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64_mg.current_time.inode_needs_update_time.file_update_time.shmem_file_write_iter
      1.20            +0.2        1.38 ±  2%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
      0.84 ±  2%      +0.2        1.04 ±  2%  perf-profile.calltrace.cycles-pp.noop_dirty_folio.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
      1.48            +0.2        1.72 ±  2%  perf-profile.calltrace.cycles-pp.current_time.inode_needs_update_time.file_update_time.shmem_file_write_iter.vfs_write
      1.00 ±  2%      +0.3        1.26 ±  3%  perf-profile.calltrace.cycles-pp.folio_mark_dirty.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
      1.99 ±  3%      +0.3        2.27 ±  4%  perf-profile.calltrace.cycles-pp.fdget.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
      6.69            +0.3        7.02        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__libc_pwrite
      2.24            +0.4        2.60 ±  3%  perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
      2.78 ±  2%      +0.4        3.21 ±  2%  perf-profile.calltrace.cycles-pp.file_update_time.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
      4.34            +0.6        4.92        perf-profile.calltrace.cycles-pp.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
     12.01            +0.9       12.92        perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.__libc_pwrite
      2.89 ±  6%      +1.4        4.26 ±  6%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__libc_pwrite
     15.09            +1.5       16.61        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_pwrite
     34.58            -5.9       28.70        perf-profile.children.cycles-pp.generic_perform_write
     48.24            -5.3       42.94 ±  2%  perf-profile.children.cycles-pp.vfs_write
     42.95            -5.3       37.66 ±  2%  perf-profile.children.cycles-pp.shmem_file_write_iter
     51.54            -4.9       46.63        perf-profile.children.cycles-pp.__x64_sys_pwrite64
     54.37            -4.5       49.86        perf-profile.children.cycles-pp.do_syscall_64
     62.76            -3.9       58.90        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     10.62 ±  2%      -2.3        8.30        perf-profile.children.cycles-pp.rep_movs_alternative
     13.47 ±  2%      -2.3       11.16        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
      0.90 ±  3%      -0.1        0.76 ±  4%  perf-profile.children.cycles-pp.folio_mark_accessed
      0.69 ±  2%      -0.1        0.60        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.29            -0.0        0.26 ±  2%  perf-profile.children.cycles-pp.testcase
      0.31 ±  3%      -0.0        0.29 ±  2%  perf-profile.children.cycles-pp.update_process_times
      0.50            -0.0        0.48        perf-profile.children.cycles-pp.rcu_all_qs
     99.67            +0.0       99.70        perf-profile.children.cycles-pp.__libc_pwrite
      0.64 ±  5%      +0.1        0.71 ±  2%  perf-profile.children.cycles-pp.fput
      0.43 ±  3%      +0.1        0.50 ±  2%  perf-profile.children.cycles-pp.folio_mapping
      0.94 ±  2%      +0.1        1.02        perf-profile.children.cycles-pp.folio_unlock
      0.74 ±  2%      +0.1        0.85 ±  2%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64_mg
      1.23            +0.2        1.41 ±  2%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.89            +0.2        1.10 ±  2%  perf-profile.children.cycles-pp.noop_dirty_folio
      1.54            +0.2        1.79 ±  2%  perf-profile.children.cycles-pp.current_time
      1.99 ±  3%      +0.3        2.27 ±  4%  perf-profile.children.cycles-pp.fdget
      1.08 ±  3%      +0.3        1.36 ±  3%  perf-profile.children.cycles-pp.folio_mark_dirty
      2.30            +0.4        2.66 ±  3%  perf-profile.children.cycles-pp.inode_needs_update_time
      2.86 ±  2%      +0.4        3.30 ±  2%  perf-profile.children.cycles-pp.file_update_time
      4.58            +0.6        5.17        perf-profile.children.cycles-pp.shmem_write_end
      1.73 ±  5%      +0.7        2.43 ±  5%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
     12.88            +1.0       13.84        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      6.99            +1.0        7.95        perf-profile.children.cycles-pp.entry_SYSCALL_64
     15.22            +1.5       16.75        perf-profile.children.cycles-pp.syscall_return_via_sysret
     10.43 ±  2%      -2.3        8.08        perf-profile.self.cycles-pp.rep_movs_alternative
      3.26            -0.4        2.86        perf-profile.self.cycles-pp.generic_perform_write
      2.02            -0.2        1.78 ±  2%  perf-profile.self.cycles-pp.shmem_get_folio_gfp
      0.87 ±  3%      -0.1        0.74 ±  4%  perf-profile.self.cycles-pp.folio_mark_accessed
      0.53 ±  2%      -0.1        0.43 ±  2%  perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
      0.25 ±  2%      -0.0        0.22 ±  2%  perf-profile.self.cycles-pp.testcase
      0.54            +0.0        0.59        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.79 ±  4%      +0.0        0.84 ±  2%  perf-profile.self.cycles-pp.__x64_sys_pwrite64
      0.51 ±  2%      +0.1        0.58 ±  2%  perf-profile.self.cycles-pp.fput
      0.38 ±  3%      +0.1        0.45 ±  2%  perf-profile.self.cycles-pp.folio_mapping
      0.74 ±  2%      +0.1        0.82        perf-profile.self.cycles-pp.folio_unlock
      0.73 ±  4%      +0.1        0.82 ±  4%  perf-profile.self.cycles-pp.inode_needs_update_time
      0.54 ±  6%      +0.1        0.64 ±  3%  perf-profile.self.cycles-pp.file_update_time
      0.72 ±  2%      +0.1        0.82 ±  2%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64_mg
      0.79 ±  2%      +0.1        0.94 ±  3%  perf-profile.self.cycles-pp.current_time
      0.98 ±  2%      +0.2        1.14 ±  2%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.85            +0.2        1.04 ±  2%  perf-profile.self.cycles-pp.noop_dirty_folio
      0.65 ±  3%      +0.2        0.85 ±  4%  perf-profile.self.cycles-pp.folio_mark_dirty
      1.11 ±  6%      +0.2        1.32 ±  2%  perf-profile.self.cycles-pp.do_syscall_64
      1.98 ±  3%      +0.3        2.26 ±  4%  perf-profile.self.cycles-pp.fdget
      2.14 ±  2%      +0.4        2.55 ±  3%  perf-profile.self.cycles-pp.__libc_pwrite
      1.63            +0.6        2.23 ±  3%  perf-profile.self.cycles-pp.shmem_write_begin
      8.54            +0.7        9.20        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      6.09            +0.9        7.03 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64
     12.75            +1.0       13.70        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
     15.20            +1.5       16.72        perf-profile.self.cycles-pp.syscall_return_via_sysret




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux