Hello, kernel test robot noticed a 180.4% improvement of filebench.sum_operations/s on: commit: a527c3ba41c4c61e2069bfce4091e5515f06a8dd ("nfs: Avoid flushing many pages with NFS_FILE_SYNC") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: filebench test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory parameters: disk: 1HDD fs: btrfs fs2: nfsv4 test: filemicro_rwritefsync.f cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240628/202406281308.6137dbb1-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase: gcc-13/performance/1HDD/nfsv4/btrfs/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-csl-2sp3/filemicro_rwritefsync.f/filebench commit: 134d0b3f24 ("nfs: propagate readlink errors in nfs_symlink_filler") a527c3ba41 ("nfs: Avoid flushing many pages with NFS_FILE_SYNC") 134d0b3f2440cddd a527c3ba41c4c61e2069bfce409 ---------------- --------------------------- %stddev %change %stddev \ | \ 2.06 -32.0% 1.40 ± 3% iostat.cpu.iowait 7.17e+10 ± 3% -62.2% 2.708e+10 ± 2% cpuidle..time 3361646 -40.5% 2001605 cpuidle..usage 797.57 ± 3% -58.3% 332.24 ± 2% uptime.boot 74461 ± 3% -58.5% 30930 ± 2% uptime.idle 986.05 ± 52% -100.0% 0.00 numa-meminfo.node0.Mlocked 41610 ± 32% -59.2% 16976 ± 79% numa-meminfo.node0.Shmem 64815 ± 3% -17.0% 53823 ± 2% numa-meminfo.node1.Active(anon) 989020 ± 10% -49.7% 497288 ± 49% numa-numastat.node0.local_node 1031591 ± 10% -46.8% 549069 ± 43% numa-numastat.node0.numa_hit 1104745 ± 11% -29.8% 775663 ± 28% numa-numastat.node1.local_node 1161905 ± 9% -29.1% 823337 ± 26% numa-numastat.node1.numa_hit 2170 ± 3% +91.4% 4154 ± 2% vmstat.io.bo 1.99 -32.0% 1.35 ± 3% vmstat.procs.b 2060 +23.5% 2543 ± 2% vmstat.system.cs 4540 ± 2% +80.5% 8197 ± 2% vmstat.system.in 2.07 -0.7 1.41 ± 3% mpstat.cpu.all.iowait% 0.06 ± 3% +0.1 0.15 ± 3% mpstat.cpu.all.irq% 0.01 ± 2% +0.0 0.02 ± 2% mpstat.cpu.all.soft% 0.05 ± 6% +0.0 0.07 ± 5% mpstat.cpu.all.sys% 0.05 ± 2% +0.1 0.12 ± 2% mpstat.cpu.all.usr% 0.37 ± 10% -0.1 0.30 ± 10% perf-profile.children.cycles-pp.perf_event_task_tick 0.15 ± 16% -0.0 0.11 ± 17% perf-profile.children.cycles-pp.rcu_core 0.16 ± 13% +0.1 0.21 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irq 0.24 ± 12% -0.1 0.19 ± 11% perf-profile.self.cycles-pp.perf_event_task_tick 0.14 ± 15% +0.0 0.19 ± 10% perf-profile.self.cycles-pp.cpuidle_governor_latency_req 2.10 +184.9% 5.98 ± 6% filebench.sum_bytes_mb/s 273.04 +180.4% 765.61 ± 6% filebench.sum_operations/s 0.00 ± 10% +27680.8% 1.20 ± 7% filebench.sum_time_ms/op 273.00 +180.4% 765.50 ± 6% filebench.sum_writes/s 746.84 ± 3% -62.3% 281.62 ± 2% filebench.time.elapsed_time 746.84 ± 3% -62.3% 281.62 ± 2% filebench.time.elapsed_time.max 246.64 ± 52% -100.0% 0.00 numa-vmstat.node0.nr_mlock 10402 ± 32% -59.2% 4243 ± 79% numa-vmstat.node0.nr_shmem 1031364 ± 10% -46.8% 548226 ± 43% numa-vmstat.node0.numa_hit 988793 ± 10% -49.8% 496445 ± 49% numa-vmstat.node0.numa_local 16202 ± 3% -17.0% 13454 ± 2% numa-vmstat.node1.nr_active_anon 16202 ± 3% -17.0% 13454 ± 2% numa-vmstat.node1.nr_zone_active_anon 1161422 ± 9% -29.2% 822014 ± 26% numa-vmstat.node1.numa_hit 1104280 ± 11% -29.9% 774340 ± 28% numa-vmstat.node1.numa_local 169724 -34.1% 111875 meminfo.Active 71034 -19.6% 57108 meminfo.Active(anon) 98690 -44.5% 54766 ± 2% meminfo.Active(file) 386512 ± 18% -46.6% 206514 ± 22% meminfo.AnonHugePages 100539 ± 4% +163.6% 264992 ± 2% meminfo.Dirty 67198 -12.6% 58722 meminfo.Mapped 1426 ± 2% -100.0% 0.00 meminfo.Mlocked 113320 -20.9% 89605 meminfo.Shmem 295425 ± 4% +125.3% 665456 meminfo.Writeback 17758 -19.6% 14279 proc-vmstat.nr_active_anon 24673 -44.5% 13701 proc-vmstat.nr_active_file 165207 -2.3% 161474 proc-vmstat.nr_anon_pages 188.72 ± 18% -46.6% 100.85 ± 22% proc-vmstat.nr_anon_transparent_hugepages 641612 -8.4% 587844 proc-vmstat.nr_dirtied 25122 ± 4% +163.5% 66189 ± 2% proc-vmstat.nr_dirty 1359330 -2.5% 1325284 proc-vmstat.nr_file_pages 174858 -3.5% 168725 proc-vmstat.nr_inactive_anon 523188 -3.3% 506043 proc-vmstat.nr_inactive_file 18536 +3.8% 19247 proc-vmstat.nr_kernel_stack 17058 -12.4% 14939 proc-vmstat.nr_mapped 356.48 ± 2% -100.0% 0.00 proc-vmstat.nr_mlock 28336 -20.9% 22408 proc-vmstat.nr_shmem 73898 ± 4% +125.0% 166281 proc-vmstat.nr_writeback 640947 -8.4% 587183 proc-vmstat.nr_written 17758 -19.6% 14279 proc-vmstat.nr_zone_active_anon 24673 -44.5% 13701 proc-vmstat.nr_zone_active_file 174858 -3.5% 168725 proc-vmstat.nr_zone_inactive_anon 523188 -3.3% 506043 proc-vmstat.nr_zone_inactive_file 41988 ± 3% +100.4% 84132 ± 2% proc-vmstat.nr_zone_write_pending 2195708 ± 5% -37.4% 1375336 ± 6% proc-vmstat.numa_hit 2095965 ± 5% -39.2% 1274986 ± 7% proc-vmstat.numa_local 46641 -13.7% 40252 proc-vmstat.pgactivate 2637615 ± 4% -32.5% 1780826 ± 5% proc-vmstat.pgalloc_normal 1924711 ± 3% -56.3% 841690 ± 3% proc-vmstat.pgfault 2504198 ± 7% -32.5% 1691266 ± 13% proc-vmstat.pgfree 1624850 -27.2% 1182486 proc-vmstat.pgpgout 89895 ± 2% -55.4% 40062 ± 5% proc-vmstat.pgreuse 2.43 +7.0% 2.60 ± 3% perf-stat.i.MPKI 67435645 ± 2% +112.8% 1.435e+08 ± 2% perf-stat.i.branch-instructions 4.56 -0.1 4.44 perf-stat.i.branch-miss-rate% 3862446 ± 2% +125.0% 8689890 ± 3% perf-stat.i.branch-misses 4.97 +2.4 7.33 ± 2% perf-stat.i.cache-miss-rate% 540701 ± 3% +86.6% 1009040 ± 2% perf-stat.i.cache-misses 7966602 +24.1% 9887537 perf-stat.i.cache-references 2039 +22.7% 2502 ± 2% perf-stat.i.context-switches 4.97e+08 ± 2% +91.0% 9.495e+08 ± 3% perf-stat.i.cpu-cycles 101.96 +4.2% 106.25 perf-stat.i.cpu-migrations 1037 +10.6% 1147 ± 3% perf-stat.i.cycles-between-cache-misses 3.314e+08 ± 2% +112.2% 7.033e+08 ± 2% perf-stat.i.instructions 0.50 +11.5% 0.56 perf-stat.i.ipc 2.11 -99.2% 0.02 ± 9% perf-stat.i.metric.K/sec 2466 +12.6% 2776 ± 2% perf-stat.i.minor-faults 2466 +12.6% 2776 ± 2% perf-stat.i.page-faults 1.63 ± 3% -12.0% 1.43 ± 3% perf-stat.overall.MPKI 5.73 +0.3 6.05 perf-stat.overall.branch-miss-rate% 6.79 ± 3% +3.4 10.21 ± 2% perf-stat.overall.cache-miss-rate% 1.50 -10.0% 1.35 perf-stat.overall.cpi 0.67 +11.1% 0.74 perf-stat.overall.ipc 67362570 ± 2% +112.3% 1.43e+08 ± 2% perf-stat.ps.branch-instructions 3858126 ± 2% +124.5% 8659606 ± 3% perf-stat.ps.branch-misses 539904 ± 3% +86.2% 1005142 ± 2% perf-stat.ps.cache-misses 7952547 +23.8% 9844369 perf-stat.ps.cache-references 2036 +22.5% 2494 ± 2% perf-stat.ps.context-switches 4.966e+08 ± 2% +90.7% 9.468e+08 ± 3% perf-stat.ps.cpu-cycles 101.81 +4.0% 105.85 perf-stat.ps.cpu-migrations 3.311e+08 ± 2% +111.7% 7.01e+08 ± 2% perf-stat.ps.instructions 2461 +12.2% 2762 ± 2% perf-stat.ps.minor-faults 2461 +12.2% 2762 ± 2% perf-stat.ps.page-faults 2.475e+11 -20.0% 1.98e+11 perf-stat.total.instructions 0.04 ± 4% +31.6% 0.05 ± 8% sched_debug.cfs_rq:/.h_nr_running.avg 20.10 ± 14% +60.4% 32.25 ± 20% sched_debug.cfs_rq:/.load_avg.avg 0.04 ± 3% +31.7% 0.05 ± 8% sched_debug.cfs_rq:/.nr_running.avg 7.67 ± 37% +146.1% 18.87 ± 29% sched_debug.cfs_rq:/.removed.load_avg.avg 3.51 ± 42% +138.3% 8.37 ± 29% sched_debug.cfs_rq:/.removed.runnable_avg.avg 3.51 ± 42% +138.3% 8.37 ± 29% sched_debug.cfs_rq:/.removed.util_avg.avg 38.15 ± 6% +101.4% 76.85 ± 8% sched_debug.cfs_rq:/.runnable_avg.avg 98.16 ± 5% +39.5% 136.95 ± 11% sched_debug.cfs_rq:/.runnable_avg.stddev 37.92 ± 6% +101.5% 76.42 ± 7% sched_debug.cfs_rq:/.util_avg.avg 656.80 ± 4% +18.8% 780.15 ± 15% sched_debug.cfs_rq:/.util_avg.max 97.57 ± 5% +39.9% 136.52 ± 11% sched_debug.cfs_rq:/.util_avg.stddev 3.28 ± 25% +98.3% 6.50 ± 40% sched_debug.cfs_rq:/.util_est.avg 123.73 ± 11% +45.4% 179.95 ± 12% sched_debug.cfs_rq:/.util_est.max 17.32 ± 11% +68.8% 29.24 ± 22% sched_debug.cfs_rq:/.util_est.stddev 389566 ± 7% -57.8% 164509 ± 7% sched_debug.cpu.clock.avg 389580 ± 7% -57.8% 164520 ± 7% sched_debug.cpu.clock.max 389555 ± 7% -57.8% 164499 ± 7% sched_debug.cpu.clock.min 8.38 ± 16% -27.1% 6.11 ± 12% sched_debug.cpu.clock.stddev 388964 ± 7% -57.8% 164063 ± 7% sched_debug.cpu.clock_task.avg 389309 ± 7% -57.8% 164329 ± 7% sched_debug.cpu.clock_task.max 381467 ± 7% -58.9% 156750 ± 7% sched_debug.cpu.clock_task.min 12392 ± 5% -46.0% 6695 ± 4% sched_debug.cpu.curr->pid.max 1368 ± 6% -36.2% 872.62 ± 4% sched_debug.cpu.curr->pid.stddev 0.03 ± 10% +52.8% 0.04 ± 10% sched_debug.cpu.nr_running.avg 0.15 ± 7% +16.5% 0.17 ± 5% sched_debug.cpu.nr_running.stddev 9004 ± 5% -47.9% 4694 ± 6% sched_debug.cpu.nr_switches.avg 77261 ± 22% -40.5% 46007 ± 9% sched_debug.cpu.nr_switches.max 1542 ± 6% -54.4% 702.61 ± 8% sched_debug.cpu.nr_switches.min 10459 ± 10% -40.6% 6217 ± 6% sched_debug.cpu.nr_switches.stddev 0.07 ± 5% -73.2% 0.02 ± 17% sched_debug.cpu.nr_uninterruptible.avg 389570 ± 7% -57.8% 164510 ± 7% sched_debug.cpu_clk 388998 ± 7% -57.9% 163938 ± 7% sched_debug.ktime 390127 ± 7% -57.7% 165072 ± 7% sched_debug.sched_clk Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki