Hello, kernel test robot noticed a 50.0% regression of filebench.sum_operations/s on: commit: e2d46f2ec332533816417b60933954173f602121 ("netfs: Change the read result collector to only use one work item") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master [test failed on linus/master 69b54314c975f4dfd3a29d6b9211ab68fff46682] [test failed on linux-next/master ed58d103e6da15a442ff87567898768dc3a66987] testcase: filebench config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory parameters: disk: 1HDD fs: ext4 fs2: cifs test: copyfiles.f cpufreq_governor: performance If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202502121520.d583b3c3-lkp@xxxxxxxxx Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250212/202502121520.d583b3c3-lkp@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/1HDD/cifs/ext4/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-icl-2sp6/copyfiles.f/filebench commit: eddf51f2bb ("afs: Make {Y,}FS.FetchData an asynchronous operation") e2d46f2ec3 ("netfs: Change the read result collector to only use one work item") eddf51f2bb2c28b0 e2d46f2ec332533816417b60933 ---------------- --------------------------- %stddev %change %stddev \ | \ 111550 ± 17% -17.2% 92328 sched_debug.cfs_rq:/.load.stddev 4.52 ±218% -98.3% 0.08 ± 12% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork 0.15 ± 2% +22.0% 0.19 ± 12% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 9404404 +5.0% 9871609 perf-stat.i.cache-references 4.82 +0.1 4.87 perf-stat.overall.branch-miss-rate% 4.88 -0.3 4.63 ± 2% perf-stat.overall.cache-miss-rate% 9290667 +5.0% 9751847 perf-stat.ps.cache-references 31.30 -50.2% 15.60 filebench.sum_bytes_mb/s 6001 -50.0% 3000 filebench.sum_operations/s 1001 -50.0% 500.00 filebench.sum_reads/s 0.11 ± 14% +104.3% 0.23 ± 8% filebench.sum_time_ms/op 1000 -50.0% 500.00 filebench.sum_writes/s 5.89 ± 8% -1.4 4.52 ± 45% perf-profile.calltrace.cycles-pp.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail.ast_mode_config_helper_atomic_commit_tail.commit_tail 5.88 ± 8% -1.4 4.52 ± 45% perf-profile.calltrace.cycles-pp.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail.ast_mode_config_helper_atomic_commit_tail 5.81 ± 8% -1.3 4.47 ± 45% perf-profile.calltrace.cycles-pp.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail 6.58 ± 7% -0.5 6.10 ± 4% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 6.74 ± 8% -0.5 6.26 ± 4% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 2.35 ± 9% -0.5 1.88 ± 10% perf-profile.calltrace.cycles-pp.getxattr 5.35 ± 3% -0.3 5.08 ± 2% perf-profile.calltrace.cycles-pp.console_flush_all.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write 5.35 ± 3% -0.3 5.08 ± 2% perf-profile.calltrace.cycles-pp.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.vfs_write 5.35 ± 3% -0.3 5.09 ± 2% perf-profile.calltrace.cycles-pp.devkmsg_emit.devkmsg_write.vfs_write.ksys_write.do_syscall_64 5.35 ± 3% -0.3 5.09 ± 2% perf-profile.calltrace.cycles-pp.vprintk_emit.devkmsg_emit.devkmsg_write.vfs_write.ksys_write 5.35 ± 3% -0.3 5.10 ± 2% perf-profile.calltrace.cycles-pp.devkmsg_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.59 ± 10% +0.1 0.66 ± 6% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.openat64 2.16 ± 2% +0.2 2.32 ± 2% perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve 2.17 ± 3% +0.2 2.33 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve 2.17 ± 3% +0.2 2.33 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve 2.17 ± 3% +0.2 2.33 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve 2.17 ± 3% +0.2 2.33 ± 2% perf-profile.calltrace.cycles-pp.execve 1.39 ± 10% +0.2 1.62 ± 11% perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry 1.44 ± 9% +0.2 1.68 ± 7% perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe 6.58 ± 7% -0.5 6.10 ± 4% perf-profile.children.cycles-pp.process_one_work 6.74 ± 8% -0.5 6.26 ± 4% perf-profile.children.cycles-pp.worker_thread 2.37 ± 9% -0.4 1.92 ± 11% perf-profile.children.cycles-pp.getxattr 5.35 ± 3% -0.3 5.08 ± 2% perf-profile.children.cycles-pp.console_flush_all 5.35 ± 3% -0.3 5.08 ± 2% perf-profile.children.cycles-pp.console_unlock 5.35 ± 3% -0.3 5.09 ± 2% perf-profile.children.cycles-pp.devkmsg_emit 5.35 ± 3% -0.3 5.09 ± 2% perf-profile.children.cycles-pp.vprintk_emit 5.35 ± 3% -0.3 5.10 ± 2% perf-profile.children.cycles-pp.devkmsg_write 0.54 ± 17% -0.1 0.39 ± 16% perf-profile.children.cycles-pp.cifs_d_revalidate 0.57 ± 15% -0.1 0.44 ± 14% perf-profile.children.cycles-pp.lookup_one_qstr_excl 0.57 ± 14% -0.1 0.43 ± 15% perf-profile.children.cycles-pp.lookup_dcache 0.21 ± 45% -0.1 0.08 ± 51% perf-profile.children.cycles-pp._copy_from_iter 0.03 ±141% +0.1 0.13 ± 38% perf-profile.children.cycles-pp.__run_timers 0.09 ± 56% +0.1 0.21 ± 34% perf-profile.children.cycles-pp.__x64_sys_munmap 0.04 ±147% +0.1 0.16 ± 47% perf-profile.children.cycles-pp.tmigr_inactive_up 0.17 ± 25% +0.1 0.29 ± 22% perf-profile.children.cycles-pp.xas_find 0.23 ± 22% +0.1 0.36 ± 10% perf-profile.children.cycles-pp.native_irq_return_iret 0.19 ± 24% +0.1 0.34 ± 9% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 0.04 ±147% +0.2 0.18 ± 46% perf-profile.children.cycles-pp.tmigr_cpu_deactivate 2.17 ± 3% +0.2 2.33 ± 2% perf-profile.children.cycles-pp.execve 2.19 ± 3% +0.2 2.36 perf-profile.children.cycles-pp.__x64_sys_execve 2.18 ± 2% +0.2 2.34 perf-profile.children.cycles-pp.do_execveat_common 0.50 ± 12% +0.2 0.69 ± 27% perf-profile.children.cycles-pp.enqueue_entity 0.15 ± 57% +0.2 0.36 ± 16% perf-profile.children.cycles-pp.perf_read 0.49 ± 25% +0.3 0.74 ± 23% perf-profile.children.cycles-pp.tick_nohz_restart_sched_tick 0.35 ± 29% +0.3 0.63 ± 15% perf-profile.children.cycles-pp.readn 0.77 ± 18% +0.3 1.09 ± 19% perf-profile.children.cycles-pp.tick_nohz_idle_exit 0.19 ± 24% +0.1 0.31 ± 10% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 0.23 ± 22% +0.1 0.36 ± 10% perf-profile.self.cycles-pp.native_irq_return_iret Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki