Hello, kernel test robot noticed a 6.2% improvement of stress-ng.fd-fork.ops_per_sec on: commit: 1fa4ffd8e6f6d001da27f00382af79bad0336091 ("close_files(): don't bother with xchg()") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: stress-ng config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: nr_threads: 100% testtime: 60s test: fd-fork cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250124/202501241646.81b10e21-lkp@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/fd-fork/stress-ng/60s commit: be5498cac2 ("remove pointless includes of <linux/fdtable.h>") 1fa4ffd8e6 ("close_files(): don't bother with xchg()") be5498cac2ddb112 1fa4ffd8e6f6d001da27f00382a ---------------- --------------------------- %stddev %change %stddev \ | \ 38705 ± 5% +11.1% 42989 ± 2% sched_debug.cpu.curr->pid.avg 96837 +6.2% 102865 stress-ng.fd-fork.ops 1611 +6.2% 1711 stress-ng.fd-fork.ops_per_sec 10.10 -6.7% 9.42 stress-ng.fd-fork.seconds_to_open_all_file_descriptors 131663 +5.2% 138573 stress-ng.time.voluntary_context_switches 4224262 ± 3% +5.5% 4458103 proc-vmstat.numa_hit 4158770 ± 3% +5.6% 4391868 ± 2% proc-vmstat.numa_local 1.002e+08 +6.8% 1.07e+08 proc-vmstat.pgalloc_normal 1.001e+08 +6.8% 1.069e+08 proc-vmstat.pgfree 200571 +14.3% 229179 ± 16% proc-vmstat.pgreuse 1.24 ± 15% +39.0% 1.72 ± 13% perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64 2.36 ± 21% -31.4% 1.62 ± 32% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 1.40 ± 17% -34.7% 0.92 ± 30% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 8.65 ± 31% +64.9% 14.26 ± 55% perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 7.30 ± 70% -70.6% 2.14 ± 97% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown] 227.50 ± 2% +13.1% 257.36 ± 13% perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm 7.35 ± 6% -13.4% 6.36 ± 8% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 22270 ± 4% +51.5% 33741 perf-sched.wait_and_delay.count.__cond_resched.__close_range.__x64_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe 55845 ± 2% -22.5% 43303 perf-sched.wait_and_delay.count.__cond_resched.put_files_struct.do_exit.do_group_exit.__x64_sys_exit_group 1051 ± 2% +8.6% 1141 ± 4% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 226.27 ± 2% +13.2% 256.09 ± 13% perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm 4.72 ± 23% +69.3% 7.99 ± 27% perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open 2.431e+10 +7.6% 2.617e+10 perf-stat.i.branch-instructions 12.91 +0.7 13.61 perf-stat.i.cache-miss-rate% 1.033e+08 +9.5% 1.132e+08 perf-stat.i.cache-misses 8.11e+08 ± 2% +3.4% 8.387e+08 perf-stat.i.cache-references 1.98 -6.4% 1.85 perf-stat.i.cpi 2168 -8.0% 1993 perf-stat.i.cycles-between-cache-misses 1.128e+11 +7.3% 1.21e+11 perf-stat.i.instructions 0.51 +6.7% 0.54 perf-stat.i.ipc 65995 ± 2% +6.6% 70332 ± 3% perf-stat.i.minor-faults 65995 ± 2% +6.6% 70332 ± 3% perf-stat.i.page-faults 0.91 +2.1% 0.93 perf-stat.overall.MPKI 12.73 +0.7 13.48 perf-stat.overall.cache-miss-rate% 1.99 -6.5% 1.86 perf-stat.overall.cpi 2179 -8.4% 1996 perf-stat.overall.cycles-between-cache-misses 0.50 +7.0% 0.54 perf-stat.overall.ipc 2.391e+10 +7.6% 2.574e+10 perf-stat.ps.branch-instructions 1.015e+08 +9.5% 1.111e+08 perf-stat.ps.cache-misses 7.975e+08 +3.4% 8.247e+08 perf-stat.ps.cache-references 1.11e+11 +7.3% 1.191e+11 perf-stat.ps.instructions 64224 ± 2% +6.7% 68500 ± 3% perf-stat.ps.minor-faults 64224 ± 2% +6.7% 68501 ± 3% perf-stat.ps.page-faults 6.87e+12 +7.4% 7.379e+12 perf-stat.total.instructions 29.66 ± 2% -1.3 28.36 ± 2% perf-profile.calltrace.cycles-pp.put_files_struct.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 30.58 ± 2% -1.2 29.42 perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe 30.58 ± 2% -1.2 29.42 perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64 30.58 ± 2% -1.2 29.42 perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe 30.58 ± 2% -1.2 29.42 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe 30.58 ± 2% -1.2 29.43 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 30.58 ± 2% -1.2 29.43 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 0.53 +0.1 0.60 ± 3% perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit 0.54 +0.1 0.60 ± 3% perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group 0.54 +0.1 0.60 ± 3% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 0.78 +0.1 0.87 ± 2% perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64 0.76 +0.1 0.84 ± 2% perf-profile.calltrace.cycles-pp.dup_mmap.dup_mm.copy_process.kernel_clone.__do_sys_clone 0.34 ± 70% +0.3 0.65 perf-profile.calltrace.cycles-pp.rcu_all_qs.__cond_resched.put_files_struct.do_exit.do_group_exit 1.18 ± 3% +0.4 1.56 perf-profile.calltrace.cycles-pp.__cond_resched.put_files_struct.do_exit.do_group_exit.__x64_sys_exit_group 21.65 +0.8 22.48 perf-profile.calltrace.cycles-pp.dup_fd.copy_process.kernel_clone.__do_sys_clone.do_syscall_64 22.48 +0.9 23.40 perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe 22.50 +0.9 23.43 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork 22.50 +0.9 23.43 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._Fork 22.50 +0.9 23.42 perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork 22.50 +0.9 23.42 perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork 22.52 +0.9 23.45 perf-profile.calltrace.cycles-pp._Fork 1.47 ± 4% +1.0 2.49 perf-profile.calltrace.cycles-pp.dnotify_flush.filp_flush.filp_close.put_files_struct.do_exit 2.19 ± 4% +1.2 3.38 ± 2% perf-profile.calltrace.cycles-pp.locks_remove_posix.filp_flush.filp_close.put_files_struct.do_exit 9.47 ± 2% +2.7 12.14 ± 2% perf-profile.calltrace.cycles-pp.fput.filp_close.put_files_struct.do_exit.do_group_exit 22.10 ± 2% +3.5 25.60 ± 2% perf-profile.calltrace.cycles-pp.filp_close.put_files_struct.do_exit.do_group_exit.__x64_sys_exit_group 30.02 ± 2% -1.2 28.79 perf-profile.children.cycles-pp.put_files_struct 30.60 ± 2% -1.2 29.44 perf-profile.children.cycles-pp.do_exit 30.60 ± 2% -1.2 29.44 perf-profile.children.cycles-pp.__x64_sys_exit_group 30.60 ± 2% -1.2 29.44 perf-profile.children.cycles-pp.do_group_exit 30.60 ± 2% -1.2 29.44 perf-profile.children.cycles-pp.x64_sys_call 0.09 +0.0 0.10 perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook 0.11 ± 3% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.down_write 0.13 ± 3% +0.0 0.15 ± 3% perf-profile.children.cycles-pp.handle_mm_fault 0.15 ± 3% +0.0 0.16 perf-profile.children.cycles-pp.kmem_cache_alloc_noprof 0.14 ± 2% +0.0 0.16 ± 3% perf-profile.children.cycles-pp.kmem_cache_free 0.16 +0.0 0.18 ± 2% perf-profile.children.cycles-pp.do_user_addr_fault 0.16 +0.0 0.18 ± 2% perf-profile.children.cycles-pp.exc_page_fault 0.25 +0.0 0.27 ± 2% perf-profile.children.cycles-pp.anon_vma_clone 0.24 ± 2% +0.0 0.27 perf-profile.children.cycles-pp.free_pgtables 0.31 +0.0 0.34 ± 2% perf-profile.children.cycles-pp.anon_vma_fork 0.13 ± 5% +0.0 0.16 ± 8% perf-profile.children.cycles-pp.copy_p4d_range 0.54 +0.1 0.60 ± 3% perf-profile.children.cycles-pp.__mmput 0.54 +0.1 0.61 ± 3% perf-profile.children.cycles-pp.exit_mm 0.53 +0.1 0.60 ± 3% perf-profile.children.cycles-pp.exit_mmap 0.78 +0.1 0.87 ± 2% perf-profile.children.cycles-pp.dup_mm 0.76 +0.1 0.85 ± 2% perf-profile.children.cycles-pp.dup_mmap 1.53 +0.2 1.75 perf-profile.children.cycles-pp.rcu_all_qs 3.50 +0.5 4.03 perf-profile.children.cycles-pp.__cond_resched 21.65 +0.8 22.48 perf-profile.children.cycles-pp.dup_fd 22.48 +0.9 23.40 perf-profile.children.cycles-pp.copy_process 22.50 +0.9 23.42 perf-profile.children.cycles-pp.kernel_clone 22.50 +0.9 23.42 perf-profile.children.cycles-pp.__do_sys_clone 22.53 +0.9 23.46 perf-profile.children.cycles-pp._Fork 33.30 +0.9 34.24 perf-profile.children.cycles-pp.filp_flush 3.75 +1.0 4.78 perf-profile.children.cycles-pp.dnotify_flush 5.04 +1.2 6.22 perf-profile.children.cycles-pp.locks_remove_posix 21.42 +2.6 24.05 perf-profile.children.cycles-pp.fput 56.02 +3.7 59.71 perf-profile.children.cycles-pp.filp_close 6.16 ± 2% -5.2 0.91 perf-profile.self.cycles-pp.put_files_struct 24.86 -1.2 23.65 perf-profile.self.cycles-pp.filp_flush 1.14 +0.2 1.31 perf-profile.self.cycles-pp.rcu_all_qs 1.76 +0.2 1.94 perf-profile.self.cycles-pp.filp_close 1.91 +0.3 2.20 perf-profile.self.cycles-pp.__cond_resched 21.51 +0.8 22.34 perf-profile.self.cycles-pp.dup_fd 3.30 +1.0 4.26 perf-profile.self.cycles-pp.dnotify_flush 4.58 +1.1 5.72 perf-profile.self.cycles-pp.locks_remove_posix 20.87 +2.6 23.46 perf-profile.self.cycles-pp.fput Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki