[linux-next:master] [vfs] bdf6091183: stress-ng.full.ops_per_sec 633.4% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 633.4% improvement of stress-ng.full.ops_per_sec on:


commit: bdf609118326e7c15f1c7efbc629bd9f7f307231 ("vfs: move d_lockref out of the area used by RCU lookup")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

testcase: stress-ng
test machine: 256 threads 2 sockets GENUINE INTEL(R) XEON(R) (Sierra Forest) with 128G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: full
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240627/202406270909.adb09955-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-srf-2sp1/full/stress-ng/60s

commit: 
  d042dae6ad ("lockref: speculatively spin waiting for the lock to be released")
  bdf6091183 ("vfs: move d_lockref out of the area used by RCU lookup")

d042dae6ad74df8a bdf609118326e7c15f1c7efbc62 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.24 ± 14%      +0.3        0.51 ±  6%  mpstat.cpu.all.usr%
    783327 ±  4%     +12.4%     880472 ±  4%  numa-numastat.node1.local_node
    516588 ±  9%     +15.0%     594316 ±  6%  vmstat.system.in
      8759 ± 73%    +110.7%      18455 ± 41%  numa-meminfo.node1.PageTables
    841412 ± 11%     +18.1%     993556 ±  7%  numa-meminfo.node1.Shmem
      2183 ± 72%    +111.9%       4626 ± 41%  numa-vmstat.node1.nr_page_table_pages
    210196 ± 11%     +18.2%     248382 ±  6%  numa-vmstat.node1.nr_shmem
    782967 ±  4%     +12.4%     879991 ±  4%  numa-vmstat.node1.numa_local
    244258 ±  5%     +21.1%     295853 ±  9%  sched_debug.cfs_rq:/.avg_vruntime.stddev
    456627 ± 76%     -94.3%      26089 ±  6%  sched_debug.cfs_rq:/.load.max
    244258 ±  5%     +21.1%     295853 ±  9%  sched_debug.cfs_rq:/.min_vruntime.stddev
   7656655 ± 11%    +633.4%   56155706        stress-ng.full.ops
    127609 ± 11%    +633.4%     935926        stress-ng.full.ops_per_sec
     59946            +6.6%      63873 ±  4%  stress-ng.time.involuntary_context_switches
      5.96 ± 11%    +597.3%      41.59        stress-ng.time.user_time
      1558 ±  7%     -86.6%     208.33 ±  6%  perf-c2c.DRAM.local
     15021 ±  4%     +59.5%      23957 ±  3%  perf-c2c.DRAM.remote
     15399 ±  2%    +102.6%      31205 ±  3%  perf-c2c.HITM.local
      9938 ±  3%    +103.4%      20217 ±  4%  perf-c2c.HITM.remote
     25337 ±  2%    +102.9%      51422 ±  3%  perf-c2c.HITM.total
     16172 ± 32%    +162.6%      42464 ± 13%  proc-vmstat.numa_hint_faults
     14655 ± 34%     +82.4%      26726 ± 24%  proc-vmstat.numa_hint_faults_local
   1428439            +5.2%    1502110        proc-vmstat.numa_hit
   1164410            +6.5%    1240512        proc-vmstat.numa_local
    169794 ± 14%     +32.8%     225458 ± 14%  proc-vmstat.numa_pte_updates
    185208            +5.9%     196095 ±  4%  proc-vmstat.pgactivate
   1510415            +4.9%    1584896        proc-vmstat.pgalloc_normal
 7.553e+09 ± 11%     +42.2%  1.074e+10 ±  7%  perf-stat.i.branch-instructions
  20529685 ± 22%     +58.4%   32511073 ± 12%  perf-stat.i.branch-misses
     18.77 ±  9%      +9.6       28.36 ±  6%  perf-stat.i.cache-miss-rate%
   5757124 ± 11%     +71.2%    9853953 ±  8%  perf-stat.i.cache-misses
  27469874 ±  9%     +23.9%   34036598 ±  7%  perf-stat.i.cache-references
      2575 ±  2%      +6.1%       2732 ±  2%  perf-stat.i.context-switches
     16.75 ±  8%     -24.4%      12.66 ±  4%  perf-stat.i.cpi
    335.17 ±  2%      +5.4%     353.20        perf-stat.i.cpu-migrations
    119311 ± 12%     -44.0%      66812 ±  5%  perf-stat.i.cycles-between-cache-misses
 3.106e+10 ± 11%     +49.4%   4.64e+10 ±  7%  perf-stat.i.instructions
      0.19 ±  4%     +15.2%       0.22        perf-stat.overall.MPKI
     21.65 ±  2%      +8.2       29.84 ±  2%  perf-stat.overall.cache-miss-rate%
     18.46           -28.3%      13.23        perf-stat.overall.cpi
     98417 ±  4%     -37.9%      61109        perf-stat.overall.cycles-between-cache-misses
      0.05           +39.5%       0.08        perf-stat.overall.ipc
 7.648e+09 ±  9%     +39.7%  1.069e+10 ±  6%  perf-stat.ps.branch-instructions
  20972501 ± 19%     +52.4%   31965991 ± 10%  perf-stat.ps.branch-misses
   5909643 ±  9%     +69.3%   10006290 ±  7%  perf-stat.ps.cache-misses
  27252734 ±  7%     +23.0%   33515970 ±  6%  perf-stat.ps.cache-references
      2461            +6.3%       2615        perf-stat.ps.context-switches
    323.20            +4.6%     338.19        perf-stat.ps.cpu-migrations
 3.146e+10 ±  9%     +46.7%  4.616e+10 ±  6%  perf-stat.ps.instructions
 2.154e+12           +38.9%  2.992e+12        perf-stat.total.instructions
     24.75           -24.7        0.00        perf-profile.calltrace.cycles-pp.dput.terminate_walk.path_openat.do_filp_open.do_sys_openat2
     24.75           -24.7        0.00        perf-profile.calltrace.cycles-pp.terminate_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
     24.74           -24.7        0.00        perf-profile.calltrace.cycles-pp.__legitimize_path.try_to_unlazy.complete_walk.do_open.path_openat
     24.74           -24.7        0.00        perf-profile.calltrace.cycles-pp.complete_walk.do_open.path_openat.do_filp_open.do_sys_openat2
     24.74           -24.7        0.00        perf-profile.calltrace.cycles-pp.try_to_unlazy.complete_walk.do_open.path_openat.do_filp_open
     24.74           -24.7        0.00        perf-profile.calltrace.cycles-pp.lockref_get_not_dead.__legitimize_path.try_to_unlazy.complete_walk.do_open
     24.73           -24.7        0.00        perf-profile.calltrace.cycles-pp.dput.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
     24.71           -24.7        0.00        perf-profile.calltrace.cycles-pp.lockref_get.do_dentry_open.do_open.path_openat.do_filp_open
     24.84           -24.2        0.65 ±  9%  perf-profile.calltrace.cycles-pp.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
     24.85           -24.2        0.69 ±  8%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
     24.85           -24.2        0.69 ±  8%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
     24.84           -24.2        0.68 ±  9%  perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
     24.85           -24.1        0.72 ±  8%  perf-profile.calltrace.cycles-pp.__close
     23.68           -23.7        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock.dput.terminate_walk.path_openat.do_filp_open
     23.67           -23.7        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock.lockref_get_not_dead.__legitimize_path.try_to_unlazy.complete_walk
     23.67           -23.7        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock.lockref_get.do_dentry_open.do_open.path_openat
     23.67           -23.7        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock.dput.__fput.__x64_sys_close.do_syscall_64
     23.63           -23.6        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.dput.terminate_walk.path_openat
     23.62           -23.6        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.lockref_get_not_dead.__legitimize_path.try_to_unlazy
     23.62           -23.6        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.lockref_get.do_dentry_open.do_open
     23.62           -23.6        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.dput.__fput.__x64_sys_close
     74.50           +23.3       97.82        perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
     74.50           +23.3       97.82        perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
     74.52           +23.3       97.84        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
     74.52           +23.3       97.84        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
     74.41           +23.3       97.74        perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
     74.41           +23.3       97.75        perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
     74.52           +23.4       97.88        perf-profile.calltrace.cycles-pp.open64
     49.65           +47.5       97.18        perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
     24.83           +72.0       96.82        perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2
      0.00           +96.0       95.99        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.chrdev_open.do_dentry_open.do_open
      0.00           +96.2       96.18        perf-profile.calltrace.cycles-pp._raw_spin_lock.chrdev_open.do_dentry_open.do_open.path_openat
      0.00           +96.3       96.34        perf-profile.calltrace.cycles-pp.chrdev_open.do_dentry_open.do_open.path_openat.do_filp_open
     49.48           -48.8        0.65 ± 13%  perf-profile.children.cycles-pp.dput
     24.71           -24.5        0.22 ± 12%  perf-profile.children.cycles-pp.lockref_get
     24.74           -24.4        0.31 ± 10%  perf-profile.children.cycles-pp.lockref_get_not_dead
     24.74           -24.4        0.32 ± 10%  perf-profile.children.cycles-pp.__legitimize_path
     24.74           -24.4        0.32 ± 10%  perf-profile.children.cycles-pp.complete_walk
     24.74           -24.4        0.32 ± 10%  perf-profile.children.cycles-pp.try_to_unlazy
     24.75           -24.4        0.34 ± 12%  perf-profile.children.cycles-pp.terminate_walk
     24.84           -24.2        0.65 ±  9%  perf-profile.children.cycles-pp.__fput
     24.84           -24.2        0.68 ±  9%  perf-profile.children.cycles-pp.__x64_sys_close
     24.85           -24.1        0.73 ±  8%  perf-profile.children.cycles-pp.__close
      2.13 ±  6%      -1.5        0.65 ± 13%  perf-profile.children.cycles-pp.lockref_put_return
     99.79            -0.4       99.40        perf-profile.children.cycles-pp.do_syscall_64
     99.80            -0.4       99.42        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.23 ±  2%      +0.0        0.25        perf-profile.children.cycles-pp.ksys_write
      0.08 ±  5%      +0.0        0.13 ±  2%  perf-profile.children.cycles-pp.apparmor_file_free_security
      0.08 ±  5%      +0.0        0.13 ±  2%  perf-profile.children.cycles-pp.security_file_free
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.stress_full
      0.02 ±141%      +0.1        0.07        perf-profile.children.cycles-pp.__x64_sys_pread64
      0.26            +0.1        0.32 ±  2%  perf-profile.children.cycles-pp.write
      0.02 ± 99%      +0.1        0.09 ±  4%  perf-profile.children.cycles-pp.__do_sys_newfstatat
      0.02 ±141%      +0.1        0.08        perf-profile.children.cycles-pp.ksys_read
      0.08 ±  5%      +0.1        0.15 ±  2%  perf-profile.children.cycles-pp.vfs_read
      0.05            +0.1        0.12 ±  3%  perf-profile.children.cycles-pp.__libc_pread
      0.05            +0.1        0.13 ±  2%  perf-profile.children.cycles-pp.read
      0.05            +0.1        0.13 ±  2%  perf-profile.children.cycles-pp.fstatat64
      0.00            +0.1        0.08 ±  4%  perf-profile.children.cycles-pp.mas_rev_awalk
      0.08 ±  6%      +0.1        0.17 ±  4%  perf-profile.children.cycles-pp.apparmor_file_open
      0.08 ±  6%      +0.1        0.18 ±  4%  perf-profile.children.cycles-pp.security_file_open
      0.00            +0.1        0.10        perf-profile.children.cycles-pp.iov_iter_zero
      0.00            +0.1        0.10 ±  3%  perf-profile.children.cycles-pp.read_iter_zero
      0.00            +0.1        0.11 ±  3%  perf-profile.children.cycles-pp.ioctl
      0.00            +0.1        0.12 ±  4%  perf-profile.children.cycles-pp.mas_empty_area_rev
      0.00            +0.1        0.14        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.00            +0.1        0.15 ±  3%  perf-profile.children.cycles-pp.apparmor_file_alloc_security
      0.00            +0.1        0.15 ±  4%  perf-profile.children.cycles-pp.kobject_get_unless_zero
      0.00            +0.2        0.16 ±  3%  perf-profile.children.cycles-pp.security_file_alloc
      0.00            +0.2        0.16 ±  2%  perf-profile.children.cycles-pp.init_file
      0.00            +0.2        0.17 ±  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.00            +0.2        0.17 ±  2%  perf-profile.children.cycles-pp.vm_unmapped_area
      0.00            +0.2        0.18 ± 10%  perf-profile.children.cycles-pp.cdev_put
      0.00            +0.2        0.18 ± 10%  perf-profile.children.cycles-pp.kobject_put
      0.00            +0.2        0.19        perf-profile.children.cycles-pp.alloc_empty_file
      0.00            +0.2        0.19        perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown_vmflags
      0.00            +0.2        0.20 ±  2%  perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
      0.00            +0.2        0.20        perf-profile.children.cycles-pp.__get_unmapped_area
      0.00            +0.2        0.21 ±  2%  perf-profile.children.cycles-pp.do_mmap
      0.02 ± 99%      +0.3        0.29        perf-profile.children.cycles-pp.vm_mmap_pgoff
      0.02 ± 99%      +0.3        0.31        perf-profile.children.cycles-pp.ksys_mmap_pgoff
      0.06 ±  9%      +0.3        0.40        perf-profile.children.cycles-pp.__mmap
     94.70            +1.5       96.19        perf-profile.children.cycles-pp._raw_spin_lock
     94.51            +1.5       96.01        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     74.50           +23.3       97.82        perf-profile.children.cycles-pp.__x64_sys_openat
     74.50           +23.3       97.82        perf-profile.children.cycles-pp.do_sys_openat2
     74.41           +23.3       97.74        perf-profile.children.cycles-pp.path_openat
     74.41           +23.3       97.75        perf-profile.children.cycles-pp.do_filp_open
     74.52           +23.4       97.89        perf-profile.children.cycles-pp.open64
     49.65           +47.5       97.18        perf-profile.children.cycles-pp.do_open
     24.83           +72.0       96.82        perf-profile.children.cycles-pp.do_dentry_open
      0.00           +96.3       96.34        perf-profile.children.cycles-pp.chrdev_open
      2.12 ±  6%      -1.5        0.64 ± 13%  perf-profile.self.cycles-pp.lockref_put_return
      1.04 ±  7%      -0.8        0.22 ± 12%  perf-profile.self.cycles-pp.lockref_get
      1.06 ±  6%      -0.7        0.31 ± 10%  perf-profile.self.cycles-pp.lockref_get_not_dead
      0.08 ±  5%      +0.0        0.13 ±  2%  perf-profile.self.cycles-pp.apparmor_file_free_security
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.stress_full
      0.00            +0.1        0.07        perf-profile.self.cycles-pp.mas_rev_awalk
      0.00            +0.1        0.08 ±  4%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.00            +0.1        0.09        perf-profile.self.cycles-pp.do_dentry_open
      0.08 ±  6%      +0.1        0.17 ±  4%  perf-profile.self.cycles-pp.apparmor_file_open
      0.00            +0.1        0.10        perf-profile.self.cycles-pp.iov_iter_zero
      0.00            +0.1        0.14        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.00            +0.1        0.15 ±  3%  perf-profile.self.cycles-pp.apparmor_file_alloc_security
      0.00            +0.1        0.15 ±  4%  perf-profile.self.cycles-pp.kobject_get_unless_zero
      0.00            +0.2        0.18 ± 10%  perf-profile.self.cycles-pp.kobject_put
     94.04            +1.5       95.52        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux