Hello, kernel test robot noticed a 3.8% regression of filebench.sum_operations/s on: commit: 900bbaae67e980945dec74d36f8afe0de7556d5a ("epoll: Add synchronous wakeup support for ep_poll_callback") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master [test failed on linux-next/master 5b913f5d7d7fe0f567dea8605f21da6eaa1735fb] testcase: filebench config: x86_64-rhel-8.3 compiler: gcc-12 test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory parameters: disk: 1HDD fs: ext4 fs2: cifs test: webproxy.f cpufreq_governor: performance If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202411122121.de84272a-oliver.sang@xxxxxxxxx Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20241112/202411122121.de84272a-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-icl-2sp6/webproxy.f/filebench commit: 0dfcb72d33 ("coredump: add cond_resched() to dump_user_range") 900bbaae67 ("epoll: Add synchronous wakeup support for ep_poll_callback") 0dfcb72d33c767bb 900bbaae67e980945dec74d36f8 ---------------- --------------------------- %stddev %change %stddev \ | \ 0.03 +0.0 0.04 mpstat.cpu.all.irq% 0.85 -0.1 0.76 mpstat.cpu.all.sys% 2185818 ± 58% -45.3% 1195059 ±104% numa-meminfo.node1.FilePages 1975339 ± 64% -52.1% 946422 ±133% numa-meminfo.node1.Unevictable 364.50 ± 12% +50.6% 549.00 ± 2% perf-c2c.DRAM.remote 208.17 ± 12% +56.4% 325.67 ± 6% perf-c2c.HITM.remote 1002 ± 59% +152.4% 2530 ± 32% sched_debug.cpu.nr_switches.min 8764 ± 3% -25.7% 6515 ± 7% sched_debug.cpu.nr_switches.stddev 13791 -5.3% 13057 vmstat.system.cs 11314 +4.2% 11784 vmstat.system.in 546482 ± 58% -45.3% 298775 ±104% numa-vmstat.node1.nr_file_pages 493834 ± 64% -52.1% 236605 ±133% numa-vmstat.node1.nr_unevictable 493834 ± 64% -52.1% 236605 ±133% numa-vmstat.node1.nr_zone_unevictable 13.58 -3.2% 13.15 filebench.sum_bytes_mb/s 232514 -3.8% 223695 filebench.sum_operations 3874 -3.8% 3727 filebench.sum_operations/s 1019 -3.8% 980.50 filebench.sum_reads/s 25.75 +3.9% 26.76 filebench.sum_time_ms/op 203.83 -3.7% 196.33 filebench.sum_writes/s 499886 -1.8% 490769 filebench.time.file_system_outputs 17741 ± 2% -3.9% 17040 filebench.time.minor_page_faults 68.50 -14.6% 58.50 filebench.time.percent_of_cpu_this_job_got 123.86 -14.9% 105.36 filebench.time.system_time 350879 -2.8% 341014 filebench.time.voluntary_context_switches 29557 -4.4% 28256 proc-vmstat.nr_active_anon 16635 ± 3% +4.3% 17352 proc-vmstat.nr_active_file 37364 -3.8% 35926 proc-vmstat.nr_shmem 29557 -4.4% 28256 proc-vmstat.nr_zone_active_anon 16635 ± 3% +4.3% 17352 proc-vmstat.nr_zone_active_file 12281 ± 13% +47.2% 18083 ± 15% proc-vmstat.numa_hint_faults 965.00 ± 6% -30.8% 668.00 ± 20% proc-vmstat.numa_huge_pte_updates 518951 ± 6% -28.2% 372754 ± 20% proc-vmstat.numa_pte_updates 73011 -1.1% 72183 proc-vmstat.pgactivate 698445 +2.2% 713680 proc-vmstat.pgfault 31722 +14.9% 36439 ± 3% proc-vmstat.pgreuse 1.12 ± 20% -0.3 0.81 ± 8% perf-profile.children.cycles-pp.__lookup_slow 0.37 ± 26% -0.2 0.20 ± 29% perf-profile.children.cycles-pp.vma_alloc_folio_noprof 0.39 ± 9% -0.2 0.22 ± 56% perf-profile.children.cycles-pp.__hrtimer_next_event_base 0.18 ± 40% -0.1 0.06 ± 73% perf-profile.children.cycles-pp.__poll 0.18 ± 40% -0.1 0.06 ± 73% perf-profile.children.cycles-pp.__x64_sys_poll 0.18 ± 40% -0.1 0.06 ± 73% perf-profile.children.cycles-pp.do_sys_poll 0.16 ± 45% -0.1 0.05 ± 84% perf-profile.children.cycles-pp.perf_evlist__poll_thread 0.15 ± 33% +0.1 0.25 ± 15% perf-profile.children.cycles-pp.smp_call_function_many_cond 0.03 ±100% +0.1 0.14 ± 49% perf-profile.children.cycles-pp.lockref_get_not_dead 0.13 ± 47% +0.1 0.28 ± 39% perf-profile.children.cycles-pp.irq_work_tick 0.49 ± 32% +0.3 0.77 ± 20% perf-profile.children.cycles-pp.__wait_for_common 0.82 ± 20% +0.4 1.22 ± 21% perf-profile.children.cycles-pp.affine_move_task 0.11 ± 37% -0.1 0.04 ±112% perf-profile.self.cycles-pp.task_contending 0.03 ±100% +0.1 0.14 ± 49% perf-profile.self.cycles-pp.lockref_get_not_dead 9.279e+08 -5.0% 8.816e+08 perf-stat.i.branch-instructions 2.93 +0.0 2.98 perf-stat.i.branch-miss-rate% 13227049 +4.3% 13791182 perf-stat.i.branch-misses 2.99 +0.2 3.20 ± 2% perf-stat.i.cache-miss-rate% 1805840 ± 2% +19.2% 2152127 perf-stat.i.cache-misses 47931959 +6.2% 50910857 perf-stat.i.cache-references 13706 -4.3% 13122 perf-stat.i.context-switches 4.597e+09 -8.8% 4.192e+09 perf-stat.i.cpu-cycles 338.72 +70.9% 578.79 perf-stat.i.cpu-migrations 2345 ± 2% -13.2% 2036 ± 3% perf-stat.i.cycles-between-cache-misses 4.233e+09 -4.7% 4.035e+09 perf-stat.i.instructions 0.77 +1.8% 0.78 perf-stat.i.ipc 2957 ± 2% +4.2% 3081 perf-stat.i.minor-faults 2957 ± 2% +4.2% 3081 perf-stat.i.page-faults 0.43 ± 2% +25.0% 0.53 perf-stat.overall.MPKI 1.42 +0.1 1.56 perf-stat.overall.branch-miss-rate% 3.77 ± 2% +0.5 4.23 perf-stat.overall.cache-miss-rate% 1.09 -4.3% 1.04 perf-stat.overall.cpi 2547 ± 2% -23.5% 1948 perf-stat.overall.cycles-between-cache-misses 0.92 +4.5% 0.96 perf-stat.overall.ipc 9.229e+08 -5.0% 8.77e+08 perf-stat.ps.branch-instructions 13149814 +4.3% 13712774 perf-stat.ps.branch-misses 1795922 ± 2% +19.2% 2140605 perf-stat.ps.cache-misses 47675878 +6.2% 50645024 perf-stat.ps.cache-references 13636 -4.3% 13055 perf-stat.ps.context-switches 4.573e+09 -8.8% 4.171e+09 perf-stat.ps.cpu-cycles 336.94 +70.9% 575.93 perf-stat.ps.cpu-migrations 4.21e+09 -4.7% 4.014e+09 perf-stat.ps.instructions 2934 ± 2% +4.2% 3057 perf-stat.ps.minor-faults 2934 ± 2% +4.2% 3057 perf-stat.ps.page-faults 7.63e+11 -4.2% 7.309e+11 perf-stat.total.instructions 0.00 ±223% +6816.7% 0.07 ± 34% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra 0.03 ± 20% +40.2% 0.04 ± 16% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 0.03 ± 3% +53.0% 0.05 ± 3% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 0.08 ± 5% -11.9% 0.07 ± 3% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm 0.05 ± 3% +14.2% 0.05 ± 4% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups 0.03 +26.5% 0.03 ± 2% perf-sched.sch_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked 0.02 ±223% +552.7% 0.10 ± 47% perf-sched.sch_delay.max.ms.__cond_resched.cancel_work_sync._cifsFileInfo_put.cifs_close_deferred_file_under_dentry.cifs_unlink 0.00 ±223% +15516.7% 0.16 ± 30% perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra 0.16 ± 6% +16.5% 0.19 ± 5% perf-sched.wait_and_delay.avg.ms.__cond_resched.cifs_demultiplex_thread.kthread.ret_from_fork.ret_from_fork_asm 33.98 ± 11% +28.0% 43.51 ± 5% perf-sched.wait_and_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork 0.56 +13.7% 0.63 perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range 392.79 ± 12% +29.2% 507.64 ± 7% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 1.02 +32.4% 1.35 ± 2% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 0.24 +10.7% 0.27 perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked 99.17 ± 9% -22.0% 77.33 ± 8% perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_noprof.cifs_strndup_to_utf16.cifs_convert_path_to_utf16.smb2_compound_op 82.50 ± 21% -48.7% 42.33 ± 16% perf-sched.wait_and_delay.count.__cond_resched.cancel_work_sync._cifsFileInfo_put.process_one_work.worker_thread 741.67 ± 4% +8.5% 804.67 ± 5% perf-sched.wait_and_delay.count.__cond_resched.cifs_demultiplex_thread.kthread.ret_from_fork.ret_from_fork_asm 1228 -13.1% 1067 ± 5% perf-sched.wait_and_delay.count.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork 2421 ± 3% -13.8% 2088 ± 3% perf-sched.wait_and_delay.count.__lock_sock.lock_sock_nested.tcp_recvmsg.inet6_recvmsg 41.50 ± 7% -25.3% 31.00 ± 8% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 10750 -24.7% 8094 perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 279.23 ± 2% +9.1% 304.73 ± 3% perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_noprof.cifs_strndup_to_utf16.cifs_convert_path_to_utf16.smb2_compound_op 1001 +111.3% 2115 ± 37% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 286.84 ± 4% +10.7% 317.62 ± 8% perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat 290.82 ± 3% +11.8% 325.18 ± 8% perf-sched.wait_and_delay.max.ms.wait_for_response.compound_send_recv.cifs_send_recv.SMB2_open 291.61 ± 2% +11.4% 324.95 ± 9% perf-sched.wait_and_delay.max.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_query_path_info 0.13 ± 6% +19.6% 0.15 ± 7% perf-sched.wait_time.avg.ms.__cond_resched.cifs_demultiplex_thread.kthread.ret_from_fork.ret_from_fork_asm 33.97 ± 11% +27.9% 43.46 ± 5% perf-sched.wait_time.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork 0.01 ±223% +12287.9% 0.68 ±114% perf-sched.wait_time.avg.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra 0.08 ± 4% +9.3% 0.09 ± 4% perf-sched.wait_time.avg.ms.__lock_sock.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg 0.47 +15.3% 0.54 perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range 392.76 ± 12% +29.2% 507.60 ± 7% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 0.99 +31.7% 1.30 ± 2% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 1.05 ± 3% +14.2% 1.20 ± 6% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.cifs_call_async 279.13 ± 2% +9.1% 304.65 ± 3% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_noprof.cifs_strndup_to_utf16.cifs_convert_path_to_utf16.smb2_compound_op 0.01 ±223% +35681.8% 1.97 ±121% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra 1001 +111.3% 2115 ± 37% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 286.74 ± 4% +10.7% 317.50 ± 8% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.do_unlinkat 290.73 ± 3% +11.8% 325.08 ± 8% perf-sched.wait_time.max.ms.wait_for_response.compound_send_recv.cifs_send_recv.SMB2_open 291.52 ± 2% +11.4% 324.84 ± 9% perf-sched.wait_time.max.ms.wait_for_response.compound_send_recv.smb2_compound_op.smb2_query_path_info Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki