Hello, kernel test robot noticed a -14.0% regression of phoronix-test-suite.iozone.1MB.512MB.WritePerformance.mb_s on: commit: 9aac777aaf9459786bc8463e6cbfc7e7e1abd1f9 ("filemap: Convert generic_perform_write() to support large folios") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: phoronix-test-suite test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory parameters: test: iozone-1.9.6 option_a: 1MB option_b: 512MB option_c: Write Performance cpufreq_governor: performance If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> | Closes: https://lore.kernel.org/oe-lkp/202407242232.9109947e-oliver.sang@xxxxxxxxx Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240724/202407242232.9109947e-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/option_c/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/1MB/512MB/Write Performance/debian-12-x86_64-phoronix/lkp-csl-2sp7/iozone-1.9.6/phoronix-test-suite commit: 146a99aefe ("xprtrdma: removed asm-generic headers from verbs.c") 9aac777aaf ("filemap: Convert generic_perform_write() to support large folios") 146a99aefe4a45f6 9aac777aaf9459786bc8463e6cb ---------------- --------------------------- %stddev %change %stddev \ | \ 3043 -14.0% 2618 phoronix-test-suite.iozone.1MB.512MB.WritePerformance.mb_s 6003 ± 6% +21.0% 7262 ± 21% proc-vmstat.nr_active_anon 6003 ± 6% +21.0% 7262 ± 21% proc-vmstat.nr_zone_active_anon 0.62 ± 43% +90.5% 1.19 ± 43% sched_debug.cfs_rq:/system.slice/containerd.service.load_avg.avg 0.62 ± 43% +94.9% 1.21 ± 40% sched_debug.cfs_rq:/system.slice/containerd.service.runnable_avg.avg 0.59 ± 36% +99.4% 1.19 ± 41% sched_debug.cfs_rq:/system.slice/containerd.service.se->avg.runnable_avg.avg 0.59 ± 36% +99.4% 1.19 ± 41% sched_debug.cfs_rq:/system.slice/containerd.service.se->avg.util_avg.avg 0.62 ± 43% +85.1% 1.15 ± 39% sched_debug.cfs_rq:/system.slice/containerd.service.tg_load_avg_contrib.avg 0.62 ± 43% +94.9% 1.21 ± 40% sched_debug.cfs_rq:/system.slice/containerd.service.util_avg.avg 60.61 -2.1 58.48 perf-stat.i.iTLB-load-miss-rate% 910966 -3.4% 879846 perf-stat.i.iTLB-load-misses 5100 ± 2% +4.8% 5346 ± 2% perf-stat.i.instructions-per-iTLB-miss 57.76 ± 2% +3.0 60.79 ± 3% perf-stat.i.node-load-miss-rate% 38.99 ± 2% +3.9 42.85 ± 4% perf-stat.i.node-store-miss-rate% 61.51 -2.1 59.37 perf-stat.overall.iTLB-load-miss-rate% 4574 +3.3% 4727 perf-stat.overall.instructions-per-iTLB-miss 885569 -3.3% 856059 perf-stat.ps.iTLB-load-misses 0.02 ± 58% -72.5% 0.01 ±119% perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault 0.00 ±103% +1162.5% 0.02 ±112% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm 0.03 ± 75% -87.1% 0.00 ±106% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.__flush_work.fsnotify_destroy_group 0.10 ± 27% -64.5% 0.03 ±105% perf-sched.sch_delay.max.ms.do_task_dead.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart 0.06 ± 4% +89.3% 0.11 ± 27% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.do_epoll_pwait.part 0.00 ±103% +1487.5% 0.02 ±111% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.kthread.ret_from_fork.ret_from_fork_asm 0.04 ± 79% -90.8% 0.00 ±104% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.__flush_work.fsnotify_destroy_group 3.89 ± 36% -31.4% 2.66 ± 8% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.do_epoll_pwait.part 1097 ± 14% +28.8% 1413 ± 6% perf-sched.wait_and_delay.count.futex_wait_queue.__futex_wait.futex_wait.do_futex 0.02 ± 18% -56.5% 0.01 ± 52% perf-sched.wait_time.avg.ms.__cond_resched.mmput.do_task_stat.proc_single_show.seq_read_iter 3.87 ± 37% -31.6% 2.65 ± 8% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.do_epoll_pwait.part 425.39 +13.0% 480.82 perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages 15.00 ± 80% -6.8 8.16 ±147% perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe 15.00 ± 80% -6.8 8.16 ±147% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode 15.00 ± 80% -6.8 8.16 ±147% perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64 15.00 ± 80% -6.8 8.16 ±147% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe 15.00 ± 80% -6.8 8.16 ±147% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe 4.09 ±102% -3.4 0.72 ±223% perf-profile.calltrace.cycles-pp._compound_head.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range 5.98 ± 87% -3.0 2.96 ±176% perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write 15.00 ± 80% -6.8 8.16 ±147% perf-profile.children.cycles-pp.syscall_exit_to_user_mode 5.27 ± 61% -4.1 1.15 ±223% perf-profile.children.cycles-pp.sched_balance_newidle 5.27 ± 61% -4.1 1.15 ±223% perf-profile.children.cycles-pp.sched_balance_rq 4.09 ±102% -3.4 0.72 ±223% perf-profile.children.cycles-pp._compound_head 5.98 ± 87% -3.0 2.96 ±176% perf-profile.children.cycles-pp.shmem_get_folio_gfp 4.09 ±102% -3.4 0.72 ±223% perf-profile.self.cycles-pp._compound_head Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki