Hello, kernel test robot noticed a 633.4% improvement of stress-ng.full.ops_per_sec on: commit: bdf609118326e7c15f1c7efbc629bd9f7f307231 ("vfs: move d_lockref out of the area used by RCU lookup") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master testcase: stress-ng test machine: 256 threads 2 sockets GENUINE INTEL(R) XEON(R) (Sierra Forest) with 128G memory parameters: nr_threads: 100% testtime: 60s test: full cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240627/202406270909.adb09955-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-srf-2sp1/full/stress-ng/60s commit: d042dae6ad ("lockref: speculatively spin waiting for the lock to be released") bdf6091183 ("vfs: move d_lockref out of the area used by RCU lookup") d042dae6ad74df8a bdf609118326e7c15f1c7efbc62 ---------------- --------------------------- %stddev %change %stddev \ | \ 0.24 ± 14% +0.3 0.51 ± 6% mpstat.cpu.all.usr% 783327 ± 4% +12.4% 880472 ± 4% numa-numastat.node1.local_node 516588 ± 9% +15.0% 594316 ± 6% vmstat.system.in 8759 ± 73% +110.7% 18455 ± 41% numa-meminfo.node1.PageTables 841412 ± 11% +18.1% 993556 ± 7% numa-meminfo.node1.Shmem 2183 ± 72% +111.9% 4626 ± 41% numa-vmstat.node1.nr_page_table_pages 210196 ± 11% +18.2% 248382 ± 6% numa-vmstat.node1.nr_shmem 782967 ± 4% +12.4% 879991 ± 4% numa-vmstat.node1.numa_local 244258 ± 5% +21.1% 295853 ± 9% sched_debug.cfs_rq:/.avg_vruntime.stddev 456627 ± 76% -94.3% 26089 ± 6% sched_debug.cfs_rq:/.load.max 244258 ± 5% +21.1% 295853 ± 9% sched_debug.cfs_rq:/.min_vruntime.stddev 7656655 ± 11% +633.4% 56155706 stress-ng.full.ops 127609 ± 11% +633.4% 935926 stress-ng.full.ops_per_sec 59946 +6.6% 63873 ± 4% stress-ng.time.involuntary_context_switches 5.96 ± 11% +597.3% 41.59 stress-ng.time.user_time 1558 ± 7% -86.6% 208.33 ± 6% perf-c2c.DRAM.local 15021 ± 4% +59.5% 23957 ± 3% perf-c2c.DRAM.remote 15399 ± 2% +102.6% 31205 ± 3% perf-c2c.HITM.local 9938 ± 3% +103.4% 20217 ± 4% perf-c2c.HITM.remote 25337 ± 2% +102.9% 51422 ± 3% perf-c2c.HITM.total 16172 ± 32% +162.6% 42464 ± 13% proc-vmstat.numa_hint_faults 14655 ± 34% +82.4% 26726 ± 24% proc-vmstat.numa_hint_faults_local 1428439 +5.2% 1502110 proc-vmstat.numa_hit 1164410 +6.5% 1240512 proc-vmstat.numa_local 169794 ± 14% +32.8% 225458 ± 14% proc-vmstat.numa_pte_updates 185208 +5.9% 196095 ± 4% proc-vmstat.pgactivate 1510415 +4.9% 1584896 proc-vmstat.pgalloc_normal 7.553e+09 ± 11% +42.2% 1.074e+10 ± 7% perf-stat.i.branch-instructions 20529685 ± 22% +58.4% 32511073 ± 12% perf-stat.i.branch-misses 18.77 ± 9% +9.6 28.36 ± 6% perf-stat.i.cache-miss-rate% 5757124 ± 11% +71.2% 9853953 ± 8% perf-stat.i.cache-misses 27469874 ± 9% +23.9% 34036598 ± 7% perf-stat.i.cache-references 2575 ± 2% +6.1% 2732 ± 2% perf-stat.i.context-switches 16.75 ± 8% -24.4% 12.66 ± 4% perf-stat.i.cpi 335.17 ± 2% +5.4% 353.20 perf-stat.i.cpu-migrations 119311 ± 12% -44.0% 66812 ± 5% perf-stat.i.cycles-between-cache-misses 3.106e+10 ± 11% +49.4% 4.64e+10 ± 7% perf-stat.i.instructions 0.19 ± 4% +15.2% 0.22 perf-stat.overall.MPKI 21.65 ± 2% +8.2 29.84 ± 2% perf-stat.overall.cache-miss-rate% 18.46 -28.3% 13.23 perf-stat.overall.cpi 98417 ± 4% -37.9% 61109 perf-stat.overall.cycles-between-cache-misses 0.05 +39.5% 0.08 perf-stat.overall.ipc 7.648e+09 ± 9% +39.7% 1.069e+10 ± 6% perf-stat.ps.branch-instructions 20972501 ± 19% +52.4% 31965991 ± 10% perf-stat.ps.branch-misses 5909643 ± 9% +69.3% 10006290 ± 7% perf-stat.ps.cache-misses 27252734 ± 7% +23.0% 33515970 ± 6% perf-stat.ps.cache-references 2461 +6.3% 2615 perf-stat.ps.context-switches 323.20 +4.6% 338.19 perf-stat.ps.cpu-migrations 3.146e+10 ± 9% +46.7% 4.616e+10 ± 6% perf-stat.ps.instructions 2.154e+12 +38.9% 2.992e+12 perf-stat.total.instructions 24.75 -24.7 0.00 perf-profile.calltrace.cycles-pp.dput.terminate_walk.path_openat.do_filp_open.do_sys_openat2 24.75 -24.7 0.00 perf-profile.calltrace.cycles-pp.terminate_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 24.74 -24.7 0.00 perf-profile.calltrace.cycles-pp.__legitimize_path.try_to_unlazy.complete_walk.do_open.path_openat 24.74 -24.7 0.00 perf-profile.calltrace.cycles-pp.complete_walk.do_open.path_openat.do_filp_open.do_sys_openat2 24.74 -24.7 0.00 perf-profile.calltrace.cycles-pp.try_to_unlazy.complete_walk.do_open.path_openat.do_filp_open 24.74 -24.7 0.00 perf-profile.calltrace.cycles-pp.lockref_get_not_dead.__legitimize_path.try_to_unlazy.complete_walk.do_open 24.73 -24.7 0.00 perf-profile.calltrace.cycles-pp.dput.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe 24.71 -24.7 0.00 perf-profile.calltrace.cycles-pp.lockref_get.do_dentry_open.do_open.path_openat.do_filp_open 24.84 -24.2 0.65 ± 9% perf-profile.calltrace.cycles-pp.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close 24.85 -24.2 0.69 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close 24.85 -24.2 0.69 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close 24.84 -24.2 0.68 ± 9% perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close 24.85 -24.1 0.72 ± 8% perf-profile.calltrace.cycles-pp.__close 23.68 -23.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.dput.terminate_walk.path_openat.do_filp_open 23.67 -23.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.lockref_get_not_dead.__legitimize_path.try_to_unlazy.complete_walk 23.67 -23.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.lockref_get.do_dentry_open.do_open.path_openat 23.67 -23.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.dput.__fput.__x64_sys_close.do_syscall_64 23.63 -23.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.dput.terminate_walk.path_openat 23.62 -23.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.lockref_get_not_dead.__legitimize_path.try_to_unlazy 23.62 -23.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.lockref_get.do_dentry_open.do_open 23.62 -23.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.dput.__fput.__x64_sys_close 74.50 +23.3 97.82 perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 74.50 +23.3 97.82 perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 74.52 +23.3 97.84 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 74.52 +23.3 97.84 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64 74.41 +23.3 97.74 perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64 74.41 +23.3 97.75 perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 74.52 +23.4 97.88 perf-profile.calltrace.cycles-pp.open64 49.65 +47.5 97.18 perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 24.83 +72.0 96.82 perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2 0.00 +96.0 95.99 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.chrdev_open.do_dentry_open.do_open 0.00 +96.2 96.18 perf-profile.calltrace.cycles-pp._raw_spin_lock.chrdev_open.do_dentry_open.do_open.path_openat 0.00 +96.3 96.34 perf-profile.calltrace.cycles-pp.chrdev_open.do_dentry_open.do_open.path_openat.do_filp_open 49.48 -48.8 0.65 ± 13% perf-profile.children.cycles-pp.dput 24.71 -24.5 0.22 ± 12% perf-profile.children.cycles-pp.lockref_get 24.74 -24.4 0.31 ± 10% perf-profile.children.cycles-pp.lockref_get_not_dead 24.74 -24.4 0.32 ± 10% perf-profile.children.cycles-pp.__legitimize_path 24.74 -24.4 0.32 ± 10% perf-profile.children.cycles-pp.complete_walk 24.74 -24.4 0.32 ± 10% perf-profile.children.cycles-pp.try_to_unlazy 24.75 -24.4 0.34 ± 12% perf-profile.children.cycles-pp.terminate_walk 24.84 -24.2 0.65 ± 9% perf-profile.children.cycles-pp.__fput 24.84 -24.2 0.68 ± 9% perf-profile.children.cycles-pp.__x64_sys_close 24.85 -24.1 0.73 ± 8% perf-profile.children.cycles-pp.__close 2.13 ± 6% -1.5 0.65 ± 13% perf-profile.children.cycles-pp.lockref_put_return 99.79 -0.4 99.40 perf-profile.children.cycles-pp.do_syscall_64 99.80 -0.4 99.42 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 0.23 ± 2% +0.0 0.25 perf-profile.children.cycles-pp.ksys_write 0.08 ± 5% +0.0 0.13 ± 2% perf-profile.children.cycles-pp.apparmor_file_free_security 0.08 ± 5% +0.0 0.13 ± 2% perf-profile.children.cycles-pp.security_file_free 0.00 +0.1 0.05 perf-profile.children.cycles-pp.stress_full 0.02 ±141% +0.1 0.07 perf-profile.children.cycles-pp.__x64_sys_pread64 0.26 +0.1 0.32 ± 2% perf-profile.children.cycles-pp.write 0.02 ± 99% +0.1 0.09 ± 4% perf-profile.children.cycles-pp.__do_sys_newfstatat 0.02 ±141% +0.1 0.08 perf-profile.children.cycles-pp.ksys_read 0.08 ± 5% +0.1 0.15 ± 2% perf-profile.children.cycles-pp.vfs_read 0.05 +0.1 0.12 ± 3% perf-profile.children.cycles-pp.__libc_pread 0.05 +0.1 0.13 ± 2% perf-profile.children.cycles-pp.read 0.05 +0.1 0.13 ± 2% perf-profile.children.cycles-pp.fstatat64 0.00 +0.1 0.08 ± 4% perf-profile.children.cycles-pp.mas_rev_awalk 0.08 ± 6% +0.1 0.17 ± 4% perf-profile.children.cycles-pp.apparmor_file_open 0.08 ± 6% +0.1 0.18 ± 4% perf-profile.children.cycles-pp.security_file_open 0.00 +0.1 0.10 perf-profile.children.cycles-pp.iov_iter_zero 0.00 +0.1 0.10 ± 3% perf-profile.children.cycles-pp.read_iter_zero 0.00 +0.1 0.11 ± 3% perf-profile.children.cycles-pp.ioctl 0.00 +0.1 0.12 ± 4% perf-profile.children.cycles-pp.mas_empty_area_rev 0.00 +0.1 0.14 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 0.00 +0.1 0.15 ± 3% perf-profile.children.cycles-pp.apparmor_file_alloc_security 0.00 +0.1 0.15 ± 4% perf-profile.children.cycles-pp.kobject_get_unless_zero 0.00 +0.2 0.16 ± 3% perf-profile.children.cycles-pp.security_file_alloc 0.00 +0.2 0.16 ± 2% perf-profile.children.cycles-pp.init_file 0.00 +0.2 0.17 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64 0.00 +0.2 0.17 ± 2% perf-profile.children.cycles-pp.vm_unmapped_area 0.00 +0.2 0.18 ± 10% perf-profile.children.cycles-pp.cdev_put 0.00 +0.2 0.18 ± 10% perf-profile.children.cycles-pp.kobject_put 0.00 +0.2 0.19 perf-profile.children.cycles-pp.alloc_empty_file 0.00 +0.2 0.19 perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown_vmflags 0.00 +0.2 0.20 ± 2% perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags 0.00 +0.2 0.20 perf-profile.children.cycles-pp.__get_unmapped_area 0.00 +0.2 0.21 ± 2% perf-profile.children.cycles-pp.do_mmap 0.02 ± 99% +0.3 0.29 perf-profile.children.cycles-pp.vm_mmap_pgoff 0.02 ± 99% +0.3 0.31 perf-profile.children.cycles-pp.ksys_mmap_pgoff 0.06 ± 9% +0.3 0.40 perf-profile.children.cycles-pp.__mmap 94.70 +1.5 96.19 perf-profile.children.cycles-pp._raw_spin_lock 94.51 +1.5 96.01 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 74.50 +23.3 97.82 perf-profile.children.cycles-pp.__x64_sys_openat 74.50 +23.3 97.82 perf-profile.children.cycles-pp.do_sys_openat2 74.41 +23.3 97.74 perf-profile.children.cycles-pp.path_openat 74.41 +23.3 97.75 perf-profile.children.cycles-pp.do_filp_open 74.52 +23.4 97.89 perf-profile.children.cycles-pp.open64 49.65 +47.5 97.18 perf-profile.children.cycles-pp.do_open 24.83 +72.0 96.82 perf-profile.children.cycles-pp.do_dentry_open 0.00 +96.3 96.34 perf-profile.children.cycles-pp.chrdev_open 2.12 ± 6% -1.5 0.64 ± 13% perf-profile.self.cycles-pp.lockref_put_return 1.04 ± 7% -0.8 0.22 ± 12% perf-profile.self.cycles-pp.lockref_get 1.06 ± 6% -0.7 0.31 ± 10% perf-profile.self.cycles-pp.lockref_get_not_dead 0.08 ± 5% +0.0 0.13 ± 2% perf-profile.self.cycles-pp.apparmor_file_free_security 0.00 +0.1 0.05 perf-profile.self.cycles-pp.stress_full 0.00 +0.1 0.07 perf-profile.self.cycles-pp.mas_rev_awalk 0.00 +0.1 0.08 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64 0.00 +0.1 0.09 perf-profile.self.cycles-pp.do_dentry_open 0.08 ± 6% +0.1 0.17 ± 4% perf-profile.self.cycles-pp.apparmor_file_open 0.00 +0.1 0.10 perf-profile.self.cycles-pp.iov_iter_zero 0.00 +0.1 0.14 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 0.00 +0.1 0.15 ± 3% perf-profile.self.cycles-pp.apparmor_file_alloc_security 0.00 +0.1 0.15 ± 4% perf-profile.self.cycles-pp.kobject_get_unless_zero 0.00 +0.2 0.18 ± 10% perf-profile.self.cycles-pp.kobject_put 94.04 +1.5 95.52 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki