Hello, kernel test robot noticed a 158.3% improvement of filebench.sum_operations/s on: commit: d6a77668a708f0b5ca6713b39c178c9d9563c35b ("netfs: Downgrade i_rwsem for a buffered write") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: filebench config: x86_64-rhel-8.3 compiler: gcc-12 test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory parameters: disk: 1HDD fs: xfs fs2: cifs test: randomrw.f cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20241126/202411261616.c29946d8-lkp@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/1HDD/cifs/xfs/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-icl-2sp6/randomrw.f/filebench commit: 6ed469df0b ("nilfs2: fix kernel bug due to missing clearing of buffer delay flag") d6a77668a7 ("netfs: Downgrade i_rwsem for a buffered write") 6ed469df0bfbef3e d6a77668a708f0b5ca6713b39c1 ---------------- --------------------------- %stddev %change %stddev \ | \ 10356023 ± 13% -88.4% 1203898 ± 8% cpuidle..usage 1862 ± 17% -45.6% 1013 ± 23% perf-c2c.HITM.local 564994 ± 9% -86.4% 76928 ± 36% numa-meminfo.node1.Active(anon) 585171 ± 7% -84.9% 88374 ± 38% numa-meminfo.node1.Shmem 124475 ± 13% -92.9% 8821 ± 14% vmstat.system.cs 9926 ± 6% -39.6% 5995 ± 4% vmstat.system.in 576365 ± 10% -83.0% 98054 ± 27% meminfo.Active(anon) 1481440 ± 4% -33.1% 991806 ± 2% meminfo.Committed_AS 613566 ± 10% -79.8% 124007 ± 22% meminfo.Shmem 0.02 ± 3% -0.0 0.02 ± 4% mpstat.cpu.all.irq% 0.60 ± 2% +0.1 0.69 mpstat.cpu.all.sys% 0.18 +0.0 0.22 ± 6% mpstat.cpu.all.usr% 141224 ± 9% -86.4% 19203 ± 36% numa-vmstat.node1.nr_active_anon 146313 ± 7% -84.9% 22087 ± 38% numa-vmstat.node1.nr_shmem 141224 ± 9% -86.4% 19203 ± 36% numa-vmstat.node1.nr_zone_active_anon 91197 ± 22% -93.7% 5768 ± 19% sched_debug.cpu.nr_switches.avg 6021808 ± 30% -96.1% 232641 ± 32% sched_debug.cpu.nr_switches.max 616189 ± 24% -95.9% 25525 ± 31% sched_debug.cpu.nr_switches.stddev 144168 ± 10% -83.0% 24516 ± 27% proc-vmstat.nr_active_anon 3501815 -3.8% 3369305 proc-vmstat.nr_file_pages 28035 -5.9% 26386 proc-vmstat.nr_mapped 153431 ± 10% -79.8% 31026 ± 22% proc-vmstat.nr_shmem 25506 -1.6% 25092 proc-vmstat.nr_slab_reclaimable 144168 ± 10% -83.0% 24516 ± 27% proc-vmstat.nr_zone_active_anon 1443064 -7.1% 1340212 proc-vmstat.pgactivate 2557 ± 14% +158.3% 6606 ± 10% filebench.sum_bytes_mb/s 19644866 ± 14% +158.3% 50742596 ± 10% filebench.sum_operations 327385 ± 14% +158.3% 845638 ± 10% filebench.sum_operations/s 163882 ± 14% +189.5% 474419 ± 12% filebench.sum_reads/s 0.01 ± 15% -65.7% 0.00 filebench.sum_time_ms/op 163502 ± 14% +127.0% 371220 ± 9% filebench.sum_writes/s 56.83 +29.0% 73.33 filebench.time.percent_of_cpu_this_job_got 85.87 ± 2% +20.1% 103.10 ± 2% filebench.time.system_time 8.54 ± 10% +115.4% 18.39 ± 16% filebench.time.user_time 9795275 ± 14% -99.3% 67709 ± 70% filebench.time.voluntary_context_switches 0.01 ± 29% -100.0% 0.00 perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 0.01 ± 19% -100.0% 0.00 perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 0.00 ± 67% +469.2% 0.01 ± 12% perf-sched.total_sch_delay.average.ms 1.33 ± 13% +975.0% 14.30 ± 33% perf-sched.total_wait_and_delay.average.ms 724911 ± 10% -89.6% 75232 ± 36% perf-sched.total_wait_and_delay.count.ms 1.33 ± 13% +976.1% 14.29 ± 33% perf-sched.total_wait_time.average.ms 3.47 ± 11% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 54.35 ± 8% +403.1% 273.44 ± 19% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 19.50 ± 30% -100.0% 0.00 perf-sched.wait_and_delay.count.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 280.83 ± 12% -79.1% 58.83 ± 24% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 649458 ± 10% -99.1% 6085 ± 56% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_interruptible.netfs_start_io_read 4.62 ± 11% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm 1001 +25.4% 1254 ± 17% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 0.01 ± 22% -100.0% 0.00 perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 54.34 ± 8% +403.2% 273.41 ± 19% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 0.01 ± 19% -100.0% 0.00 perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown] 1001 +25.4% 1254 ± 17% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 0.15 ± 44% -69.5% 0.05 ± 28% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_interruptible.netfs_start_io_read 3.23 ±100% -1.0 2.18 ±142% perf-profile.calltrace.cycles-pp.cmd_stat 3.23 ±100% -1.0 2.18 ±142% perf-profile.calltrace.cycles-pp.dispatch_events.cmd_stat 3.22 ±100% -1.0 2.17 ±141% perf-profile.calltrace.cycles-pp.process_interval.dispatch_events.cmd_stat 3.12 ±100% -1.0 2.12 ±142% perf-profile.calltrace.cycles-pp.read_counters.process_interval.dispatch_events.cmd_stat 0.42 ± 34% -0.2 0.24 ± 28% perf-profile.children.cycles-pp.perf_iterate_sb 0.42 ± 22% -0.1 0.28 ± 22% perf-profile.children.cycles-pp.set_pte_range 0.11 ± 38% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.copy_page_from_iter_atomic 0.11 ± 56% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.read@plt 0.02 ±141% +0.1 0.12 ± 29% perf-profile.children.cycles-pp.aa_file_perm 0.07 ± 55% +0.1 0.17 ± 29% perf-profile.children.cycles-pp.fault_in_iov_iter_readable 0.07 ± 55% +0.1 0.17 ± 29% perf-profile.children.cycles-pp.fault_in_readable 0.09 ± 50% +0.1 0.22 ± 28% perf-profile.children.cycles-pp.getenv 0.21 ± 30% +0.2 0.37 ± 34% perf-profile.children.cycles-pp.__perf_read_group_add 0.19 ± 44% +0.2 0.36 ± 34% perf-profile.children.cycles-pp.pcpu_alloc_noprof 0.82 ± 11% +0.4 1.20 ± 13% perf-profile.children.cycles-pp.sched_balance_update_blocked_averages 0.11 ± 38% -0.1 0.04 ± 71% perf-profile.self.cycles-pp.copy_page_from_iter_atomic 0.11 ± 56% -0.1 0.04 ± 71% perf-profile.self.cycles-pp.read@plt 0.02 ±141% +0.1 0.12 ± 29% perf-profile.self.cycles-pp.aa_file_perm 0.02 ±141% +0.1 0.12 ± 31% perf-profile.self.cycles-pp.getenv 5.49 ± 4% +87.1% 10.27 ± 4% perf-stat.i.MPKI 6.113e+08 ± 6% -21.6% 4.793e+08 ± 5% perf-stat.i.branch-instructions 12875097 -9.6% 11640297 perf-stat.i.branch-misses 26605878 ± 8% +61.4% 42952527 ± 5% perf-stat.i.cache-misses 89659393 ± 6% +53.9% 1.38e+08 ± 6% perf-stat.i.cache-references 126410 ± 13% -93.0% 8884 ± 15% perf-stat.i.context-switches 1.85 ± 2% +8.3% 2.00 ± 2% perf-stat.i.cpi 2.757e+09 ± 6% -17.8% 2.265e+09 ± 4% perf-stat.i.instructions 0.58 ± 2% -8.1% 0.53 ± 2% perf-stat.i.ipc 1.00 ± 13% -97.3% 0.03 ± 57% perf-stat.i.metric.K/sec 9.63 ± 4% +96.7% 18.95 ± 2% perf-stat.overall.MPKI 2.11 ± 5% +0.3 2.42 ± 5% perf-stat.overall.branch-miss-rate% 1.56 ± 6% +19.7% 1.86 ± 4% perf-stat.overall.cpi 161.89 ± 7% -39.3% 98.29 ± 4% perf-stat.overall.cycles-between-cache-misses 0.65 ± 5% -16.5% 0.54 ± 5% perf-stat.overall.ipc 6.088e+08 ± 6% -21.3% 4.791e+08 ± 5% perf-stat.ps.branch-instructions 12794450 -9.6% 11566995 perf-stat.ps.branch-misses 26464019 ± 8% +62.1% 42902925 ± 4% perf-stat.ps.cache-misses 89144844 ± 7% +54.5% 1.378e+08 ± 6% perf-stat.ps.cache-references 126023 ± 13% -93.0% 8808 ± 15% perf-stat.ps.context-switches 2.746e+09 ± 6% -17.5% 2.264e+09 ± 4% perf-stat.ps.instructions 4.542e+11 ± 6% -17.4% 3.753e+11 ± 4% perf-stat.total.instructions Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki