Hello, kernel test robot noticed a 4194.8% improvement of filebench.sum_operations/s on: commit: edfc6481faf896301cab940da776229fe39e9fc9 ("smb3: fix perf regression with cached writes with netfs conversion") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: filebench test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory parameters: disk: 1HDD fs: ext4 fs2: cifs test: randomwrite.f cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240527/202405271633.b56b258d-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase: gcc-13/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-icl-2sp6/randomwrite.f/filebench commit: 14b1cd2534 ("cifs: Fix locking in cifs_strict_readv()") edfc6481fa ("smb3: fix perf regression with cached writes with netfs conversion") 14b1cd25346b1d61 edfc6481faf896301cab940da77 ---------------- --------------------------- %stddev %change %stddev \ | \ 3814731 ± 93% -62.9% 1414791 ± 44% cpuidle..usage 91.23 ± 4% +6.5% 97.17 iostat.cpu.idle 1817 ± 25% -49.1% 925.83 ± 36% perf-c2c.DRAM.remote 207192 +418.2% 1073659 ± 20% meminfo.AnonHugePages 2604959 ± 5% +65.7% 4315389 ± 4% meminfo.Dirty 69239 ±139% +547.1% 448063 ± 51% numa-meminfo.node0.AnonHugePages 138049 ± 70% +353.2% 625629 ± 65% numa-meminfo.node1.AnonHugePages 33.79 ±139% +547.7% 218.82 ± 51% numa-vmstat.node0.nr_anon_transparent_hugepages 67.47 ± 70% +353.0% 305.60 ± 65% numa-vmstat.node1.nr_anon_transparent_hugepages 10799 ± 25% -35.4% 6972 ± 8% sched_debug.cfs_rq:/.load.avg 37988 ±120% +526.0% 237792 ± 59% sched_debug.cpu.avg_idle.min 4690 ±153% -92.0% 376.83 ± 24% sched_debug.cpu.nr_switches.min 69222 ± 3% -16.7% 57628 vmstat.io.bo 0.73 ± 12% -24.9% 0.55 ± 2% vmstat.procs.b 19540 ± 24% -55.2% 8762 ± 12% vmstat.system.in 0.58 ± 14% -0.2 0.41 mpstat.cpu.all.iowait% 0.05 ± 32% -0.0 0.02 ± 14% mpstat.cpu.all.irq% 0.05 ± 14% -0.0 0.02 ± 6% mpstat.cpu.all.soft% 2.00 +2391.7% 49.83 ± 27% mpstat.max_utilization.seconds 58.54 ± 7% -24.5% 44.17 ± 13% mpstat.max_utilization_pct 99.67 ±163% +4194.7% 4280 ± 7% filebench.sum_bytes_mb/s 765489 ±163% +4194.8% 32875866 ± 7% filebench.sum_operations 12757 ±163% +4194.8% 547887 ± 7% filebench.sum_operations/s 0.24 ± 41% -99.2% 0.00 filebench.sum_time_ms/op 12757 ±163% +4194.8% 547887 ± 7% filebench.sum_writes/s 241.17 ± 80% +321.8% 1017 ± 8% filebench.time.involuntary_context_switches 22.67 ± 23% +63.2% 37.00 filebench.time.percent_of_cpu_this_job_got 37.73 ± 23% +49.1% 56.25 filebench.time.system_time 1.997e+09 ± 45% -62.3% 7.533e+08 ± 26% perf-stat.i.branch-instructions 11.93 ± 23% +3.9 15.84 ± 7% perf-stat.i.cache-miss-rate% 1.589e+08 ± 5% -36.2% 1.013e+08 ± 6% perf-stat.i.cache-references 1227 ± 13% -23.7% 937.19 ± 7% perf-stat.i.cycles-between-cache-misses 9.86e+09 ± 45% -63.3% 3.621e+09 ± 27% perf-stat.i.instructions 4.84 ± 44% +96.1% 9.48 ± 18% perf-stat.overall.MPKI 830.79 ± 40% -62.4% 312.02 ± 34% perf-stat.overall.cycles-between-cache-misses 1.994e+09 ± 45% -62.2% 7.528e+08 ± 27% perf-stat.ps.branch-instructions 1.585e+08 ± 5% -36.3% 1.01e+08 ± 6% perf-stat.ps.cache-references 9.842e+09 ± 45% -63.2% 3.62e+09 ± 27% perf-stat.ps.instructions 1.637e+12 ± 45% -62.9% 6.073e+11 ± 27% perf-stat.total.instructions 101.22 +418.0% 524.27 ± 20% proc-vmstat.nr_anon_transparent_hugepages 2918550 ± 3% +421.9% 15232014 ± 9% proc-vmstat.nr_dirtied 650592 ± 5% +66.0% 1079880 ± 4% proc-vmstat.nr_dirty 23980 -2.1% 23472 proc-vmstat.nr_kernel_stack 17286 ± 6% -5.1% 16397 proc-vmstat.nr_mapped 79441 -2.5% 77426 proc-vmstat.nr_slab_unreclaimable 662082 ± 6% +66.5% 1102087 ± 5% proc-vmstat.nr_zone_write_pending 8719968 ± 21% -48.5% 4491902 ± 10% proc-vmstat.numa_hit 8.00 ± 20% +12912.5% 1041 ± 45% proc-vmstat.numa_huge_pte_updates 8584943 ± 21% -49.2% 4359325 ± 10% proc-vmstat.numa_local 11674686 ± 3% -16.0% 9806002 proc-vmstat.pgpgout 2.00 +51250.0% 1027 ± 56% proc-vmstat.thp_fault_alloc 4.19 ±100% -1.7 2.53 ±144% perf-profile.calltrace.cycles-pp.scsi_end_request.scsi_io_completion.blk_complete_reqs.handle_softirqs.irq_exit_rcu 4.19 ±100% -1.7 2.53 ±144% perf-profile.calltrace.cycles-pp.scsi_io_completion.blk_complete_reqs.handle_softirqs.irq_exit_rcu.common_interrupt 4.24 ±100% -1.7 2.58 ±145% perf-profile.calltrace.cycles-pp.irq_exit_rcu.common_interrupt.asm_common_interrupt.cpuidle_enter_state.cpuidle_enter 4.23 ±100% -1.6 2.58 ±145% perf-profile.calltrace.cycles-pp.handle_softirqs.irq_exit_rcu.common_interrupt.asm_common_interrupt.cpuidle_enter_state 4.20 ±100% -1.6 2.57 ±145% perf-profile.calltrace.cycles-pp.blk_complete_reqs.handle_softirqs.irq_exit_rcu.common_interrupt.asm_common_interrupt 0.50 ± 46% +0.3 0.78 ± 5% perf-profile.calltrace.cycles-pp.write 0.28 ±100% +0.4 0.67 ± 6% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.31 ±100% +0.4 0.71 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.31 ±100% +0.4 0.71 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write 0.19 ±141% +0.5 0.64 ± 6% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 2.50 ± 14% +0.5 3.04 ± 8% perf-profile.calltrace.cycles-pp.read 2.66 ± 14% +0.6 3.28 ± 9% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 2.33 ± 11% +0.6 2.98 ± 14% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 2.33 ± 11% +0.7 3.00 ± 14% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault 0.22 ± 20% -0.2 0.06 ± 83% perf-profile.children.cycles-pp.native_apic_mem_eoi 0.23 ± 11% -0.1 0.15 ± 24% perf-profile.children.cycles-pp.getenv 0.03 ±141% +0.1 0.10 ± 29% perf-profile.children.cycles-pp.set_task_cpu 0.01 ±223% +0.1 0.08 ± 37% perf-profile.children.cycles-pp.__radix_tree_lookup 0.00 +0.1 0.10 ± 43% perf-profile.children.cycles-pp.kmalloc_trace 0.01 ±223% +0.1 0.12 ± 37% perf-profile.children.cycles-pp.free_pcppages_bulk 0.16 ± 33% +0.1 0.29 ± 29% perf-profile.children.cycles-pp.vm_area_alloc 0.10 ± 79% +0.1 0.24 ± 26% perf-profile.children.cycles-pp.leave_mm 0.24 ± 19% +0.2 0.41 ± 36% perf-profile.children.cycles-pp.strnlen_user 0.41 ± 22% +0.2 0.58 ± 19% perf-profile.children.cycles-pp.migration_cpu_stop 0.68 ± 12% +0.2 0.86 ± 6% perf-profile.children.cycles-pp.ksys_write 0.65 ± 15% +0.2 0.84 ± 6% perf-profile.children.cycles-pp.vfs_write 0.41 ± 22% +0.2 0.62 ± 19% perf-profile.children.cycles-pp.cpu_stopper_thread 0.79 ± 10% +0.2 1.00 ± 3% perf-profile.children.cycles-pp.write 0.47 ± 28% +0.2 0.70 ± 23% perf-profile.children.cycles-pp.syscall_exit_to_user_mode 0.24 ± 35% +0.2 0.47 ± 20% perf-profile.children.cycles-pp.set_pte_range 0.43 ± 28% +0.2 0.67 ± 25% perf-profile.children.cycles-pp.d_alloc_parallel 0.58 ± 23% +0.3 0.87 ± 21% perf-profile.children.cycles-pp.__lookup_slow 0.98 ± 22% +0.3 1.27 ± 13% perf-profile.children.cycles-pp.copy_process 1.39 ± 9% +0.3 1.74 ± 13% perf-profile.children.cycles-pp.filemap_map_pages 1.49 ± 9% +0.4 1.91 ± 11% perf-profile.children.cycles-pp.do_read_fault 1.75 ± 10% +0.5 2.26 ± 8% perf-profile.children.cycles-pp.do_fault 2.66 ± 14% +0.6 3.28 ± 9% perf-profile.children.cycles-pp.smpboot_thread_fn 3.94 ± 16% +0.7 4.62 ± 8% perf-profile.children.cycles-pp.read 4.09 ± 4% +0.8 4.93 ± 8% perf-profile.children.cycles-pp.asm_exc_page_fault 3.08 ± 10% +0.9 3.96 ± 8% perf-profile.children.cycles-pp.__handle_mm_fault 3.22 ± 8% +1.0 4.18 ± 8% perf-profile.children.cycles-pp.handle_mm_fault 3.44 ± 6% +1.0 4.47 ± 9% perf-profile.children.cycles-pp.do_user_addr_fault 3.45 ± 5% +1.0 4.48 ± 9% perf-profile.children.cycles-pp.exc_page_fault 20.31 ± 9% +2.6 22.93 ± 6% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 20.26 ± 9% +2.6 22.88 ± 6% perf-profile.children.cycles-pp.do_syscall_64 0.21 ± 20% -0.2 0.06 ± 83% perf-profile.self.cycles-pp.native_apic_mem_eoi 0.12 ± 30% +0.1 0.18 ± 19% perf-profile.self.cycles-pp.newidle_balance 0.01 ±223% +0.1 0.08 ± 37% perf-profile.self.cycles-pp.__radix_tree_lookup 0.00 +0.1 0.09 ± 39% perf-profile.self.cycles-pp.kmalloc_trace 0.05 ±111% +0.1 0.17 ± 36% perf-profile.self.cycles-pp.leave_mm 0.23 ± 23% +0.2 0.39 ± 40% perf-profile.self.cycles-pp.strnlen_user 0.10 ± 53% +0.2 0.30 ± 59% perf-profile.self.cycles-pp.read_counters Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki