Hello, kernel test robot noticed a 1.9% improvement of will-it-scale.per_process_ops on: commit: 077ab1260a52068a62a5fb08fa2c5f1d0dcf2738 ("dcache: back inline names with a struct-wrapped array of unsigned long") https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git work.d_revalidate testcase: will-it-scale config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 104 threads 2 sockets (Skylake) with 192G memory parameters: nr_task: 100% mode: process test: poll2 cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250110/202501101058.cd8beeba-lkp@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/poll2/will-it-scale commit: cf0cc84299 ("make sure that DNAME_INLINE_LEN is a multiple of word size") 077ab1260a ("dcache: back inline names with a struct-wrapped array of unsigned long") cf0cc842995ca3da 077ab1260a52068a62a5fb08fa2 ---------------- --------------------------- %stddev %change %stddev \ | \ 294.00 ± 10% +15.2% 338.67 ± 5% perf-c2c.DRAM.remote 243.33 ± 9% +13.7% 276.67 ± 6% perf-c2c.HITM.remote 21502 ± 5% +413.7% 110453 ±117% sched_debug.cfs_rq:/.load.max 2543 ± 6% +336.8% 11109 ±111% sched_debug.cfs_rq:/.load.stddev 274.83 ± 19% +28.8% 353.86 ± 6% sched_debug.cfs_rq:/.util_est.min 24387540 +1.9% 24841387 will-it-scale.104.processes 234495 +1.9% 238859 will-it-scale.per_process_ops 24387540 +1.9% 24841387 will-it-scale.workload 0.85 ± 11% -20.5% 0.68 ± 10% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_noprof.do_sys_poll.__x64_sys_poll.do_syscall_64 1.71 ± 11% -20.6% 1.36 ± 10% perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_noprof.do_sys_poll.__x64_sys_poll.do_syscall_64 38.41 ±104% -78.0% 8.46 perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 3676 ± 13% -34.3% 2415 ± 21% perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.85 ± 11% -20.5% 0.68 ± 10% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_noprof.do_sys_poll.__x64_sys_poll.do_syscall_64 3676 ± 13% -34.3% 2415 ± 21% perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 4.591e+10 +1.9% 4.676e+10 perf-stat.i.branch-instructions 1.367e+08 +1.9% 1.392e+08 perf-stat.i.branch-misses 1.08 -1.9% 1.06 perf-stat.i.cpi 2.584e+11 +1.9% 2.632e+11 perf-stat.i.instructions 0.92 +1.9% 0.94 perf-stat.i.ipc 1.08 -1.8% 1.06 perf-stat.overall.cpi 0.93 +1.9% 0.94 perf-stat.overall.ipc 4.575e+10 +1.9% 4.66e+10 perf-stat.ps.branch-instructions 1.362e+08 +1.9% 1.388e+08 perf-stat.ps.branch-misses 2.575e+11 +1.9% 2.623e+11 perf-stat.ps.instructions 7.785e+13 +1.9% 7.93e+13 perf-stat.total.instructions 59.17 -1.5 57.63 perf-profile.calltrace.cycles-pp.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe 71.18 -1.4 69.76 perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll 70.73 -1.4 69.32 perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll 72.76 -1.3 71.48 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll 76.80 -1.1 75.70 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__poll 43.66 -1.1 42.61 perf-profile.calltrace.cycles-pp.fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64 94.61 -0.2 94.40 perf-profile.calltrace.cycles-pp.__poll 0.92 +0.0 0.94 perf-profile.calltrace.cycles-pp.kfree.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.66 +0.1 2.73 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__poll 4.90 +0.2 5.10 perf-profile.calltrace.cycles-pp.testcase 5.81 +0.2 6.04 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.__poll 1.98 ± 3% +0.3 2.26 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__poll 7.25 +0.3 7.56 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__poll 59.29 -1.6 57.72 perf-profile.children.cycles-pp.do_poll 71.24 -1.4 69.83 perf-profile.children.cycles-pp.__x64_sys_poll 70.82 -1.4 69.41 perf-profile.children.cycles-pp.do_sys_poll 72.83 -1.3 71.55 perf-profile.children.cycles-pp.do_syscall_64 76.94 -1.1 75.84 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 43.57 -1.0 42.53 perf-profile.children.cycles-pp.fdget 95.18 -0.2 94.97 perf-profile.children.cycles-pp.__poll 1.16 ± 2% +0.2 1.32 ± 3% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 3.50 +0.2 3.69 perf-profile.children.cycles-pp.entry_SYSCALL_64 4.91 +0.2 5.12 perf-profile.children.cycles-pp.testcase 6.22 +0.2 6.46 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 7.31 +0.3 7.62 perf-profile.children.cycles-pp.syscall_return_via_sysret 42.16 -1.0 41.16 perf-profile.self.cycles-pp.fdget 16.86 -0.6 16.30 perf-profile.self.cycles-pp.do_poll 0.90 +0.0 0.93 perf-profile.self.cycles-pp.kfree 0.32 ± 2% +0.0 0.36 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 1.20 ± 3% +0.1 1.32 ± 2% perf-profile.self.cycles-pp.__poll 0.76 ± 2% +0.1 0.89 ± 4% perf-profile.self.cycles-pp.do_syscall_64 4.88 +0.1 5.00 perf-profile.self.cycles-pp.do_sys_poll 3.10 +0.2 3.28 perf-profile.self.cycles-pp.entry_SYSCALL_64 4.18 +0.2 4.37 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 4.73 +0.2 4.94 perf-profile.self.cycles-pp.testcase 6.16 +0.2 6.40 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 7.30 +0.3 7.62 perf-profile.self.cycles-pp.syscall_return_via_sysret Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki