[linus:master] [smb3] edfc6481fa: filebench.sum_operations/s 4194.8% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 4194.8% improvement of filebench.sum_operations/s on:


commit: edfc6481faf896301cab940da776229fe39e9fc9 ("smb3: fix perf regression with cached writes with netfs conversion")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: filebench
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:

	disk: 1HDD
	fs: ext4
	fs2: cifs
	test: randomwrite.f
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240527/202405271633.b56b258d-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase:
  gcc-13/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-icl-2sp6/randomwrite.f/filebench

commit: 
  14b1cd2534 ("cifs: Fix locking in cifs_strict_readv()")
  edfc6481fa ("smb3: fix perf regression with cached writes with netfs conversion")

14b1cd25346b1d61 edfc6481faf896301cab940da77 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   3814731 ± 93%     -62.9%    1414791 ± 44%  cpuidle..usage
     91.23 ±  4%      +6.5%      97.17        iostat.cpu.idle
      1817 ± 25%     -49.1%     925.83 ± 36%  perf-c2c.DRAM.remote
    207192          +418.2%    1073659 ± 20%  meminfo.AnonHugePages
   2604959 ±  5%     +65.7%    4315389 ±  4%  meminfo.Dirty
     69239 ±139%    +547.1%     448063 ± 51%  numa-meminfo.node0.AnonHugePages
    138049 ± 70%    +353.2%     625629 ± 65%  numa-meminfo.node1.AnonHugePages
     33.79 ±139%    +547.7%     218.82 ± 51%  numa-vmstat.node0.nr_anon_transparent_hugepages
     67.47 ± 70%    +353.0%     305.60 ± 65%  numa-vmstat.node1.nr_anon_transparent_hugepages
     10799 ± 25%     -35.4%       6972 ±  8%  sched_debug.cfs_rq:/.load.avg
     37988 ±120%    +526.0%     237792 ± 59%  sched_debug.cpu.avg_idle.min
      4690 ±153%     -92.0%     376.83 ± 24%  sched_debug.cpu.nr_switches.min
     69222 ±  3%     -16.7%      57628        vmstat.io.bo
      0.73 ± 12%     -24.9%       0.55 ±  2%  vmstat.procs.b
     19540 ± 24%     -55.2%       8762 ± 12%  vmstat.system.in
      0.58 ± 14%      -0.2        0.41        mpstat.cpu.all.iowait%
      0.05 ± 32%      -0.0        0.02 ± 14%  mpstat.cpu.all.irq%
      0.05 ± 14%      -0.0        0.02 ±  6%  mpstat.cpu.all.soft%
      2.00         +2391.7%      49.83 ± 27%  mpstat.max_utilization.seconds
     58.54 ±  7%     -24.5%      44.17 ± 13%  mpstat.max_utilization_pct
     99.67 ±163%   +4194.7%       4280 ±  7%  filebench.sum_bytes_mb/s
    765489 ±163%   +4194.8%   32875866 ±  7%  filebench.sum_operations
     12757 ±163%   +4194.8%     547887 ±  7%  filebench.sum_operations/s
      0.24 ± 41%     -99.2%       0.00        filebench.sum_time_ms/op
     12757 ±163%   +4194.8%     547887 ±  7%  filebench.sum_writes/s
    241.17 ± 80%    +321.8%       1017 ±  8%  filebench.time.involuntary_context_switches
     22.67 ± 23%     +63.2%      37.00        filebench.time.percent_of_cpu_this_job_got
     37.73 ± 23%     +49.1%      56.25        filebench.time.system_time
 1.997e+09 ± 45%     -62.3%  7.533e+08 ± 26%  perf-stat.i.branch-instructions
     11.93 ± 23%      +3.9       15.84 ±  7%  perf-stat.i.cache-miss-rate%
 1.589e+08 ±  5%     -36.2%  1.013e+08 ±  6%  perf-stat.i.cache-references
      1227 ± 13%     -23.7%     937.19 ±  7%  perf-stat.i.cycles-between-cache-misses
  9.86e+09 ± 45%     -63.3%  3.621e+09 ± 27%  perf-stat.i.instructions
      4.84 ± 44%     +96.1%       9.48 ± 18%  perf-stat.overall.MPKI
    830.79 ± 40%     -62.4%     312.02 ± 34%  perf-stat.overall.cycles-between-cache-misses
 1.994e+09 ± 45%     -62.2%  7.528e+08 ± 27%  perf-stat.ps.branch-instructions
 1.585e+08 ±  5%     -36.3%   1.01e+08 ±  6%  perf-stat.ps.cache-references
 9.842e+09 ± 45%     -63.2%   3.62e+09 ± 27%  perf-stat.ps.instructions
 1.637e+12 ± 45%     -62.9%  6.073e+11 ± 27%  perf-stat.total.instructions
    101.22          +418.0%     524.27 ± 20%  proc-vmstat.nr_anon_transparent_hugepages
   2918550 ±  3%    +421.9%   15232014 ±  9%  proc-vmstat.nr_dirtied
    650592 ±  5%     +66.0%    1079880 ±  4%  proc-vmstat.nr_dirty
     23980            -2.1%      23472        proc-vmstat.nr_kernel_stack
     17286 ±  6%      -5.1%      16397        proc-vmstat.nr_mapped
     79441            -2.5%      77426        proc-vmstat.nr_slab_unreclaimable
    662082 ±  6%     +66.5%    1102087 ±  5%  proc-vmstat.nr_zone_write_pending
   8719968 ± 21%     -48.5%    4491902 ± 10%  proc-vmstat.numa_hit
      8.00 ± 20%  +12912.5%       1041 ± 45%  proc-vmstat.numa_huge_pte_updates
   8584943 ± 21%     -49.2%    4359325 ± 10%  proc-vmstat.numa_local
  11674686 ±  3%     -16.0%    9806002        proc-vmstat.pgpgout
      2.00        +51250.0%       1027 ± 56%  proc-vmstat.thp_fault_alloc
      4.19 ±100%      -1.7        2.53 ±144%  perf-profile.calltrace.cycles-pp.scsi_end_request.scsi_io_completion.blk_complete_reqs.handle_softirqs.irq_exit_rcu
      4.19 ±100%      -1.7        2.53 ±144%  perf-profile.calltrace.cycles-pp.scsi_io_completion.blk_complete_reqs.handle_softirqs.irq_exit_rcu.common_interrupt
      4.24 ±100%      -1.7        2.58 ±145%  perf-profile.calltrace.cycles-pp.irq_exit_rcu.common_interrupt.asm_common_interrupt.cpuidle_enter_state.cpuidle_enter
      4.23 ±100%      -1.6        2.58 ±145%  perf-profile.calltrace.cycles-pp.handle_softirqs.irq_exit_rcu.common_interrupt.asm_common_interrupt.cpuidle_enter_state
      4.20 ±100%      -1.6        2.57 ±145%  perf-profile.calltrace.cycles-pp.blk_complete_reqs.handle_softirqs.irq_exit_rcu.common_interrupt.asm_common_interrupt
      0.50 ± 46%      +0.3        0.78 ±  5%  perf-profile.calltrace.cycles-pp.write
      0.28 ±100%      +0.4        0.67 ±  6%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.31 ±100%      +0.4        0.71 ±  8%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.31 ±100%      +0.4        0.71 ±  8%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
      0.19 ±141%      +0.5        0.64 ±  6%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      2.50 ± 14%      +0.5        3.04 ±  8%  perf-profile.calltrace.cycles-pp.read
      2.66 ± 14%      +0.6        3.28 ±  9%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      2.33 ± 11%      +0.6        2.98 ± 14%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      2.33 ± 11%      +0.7        3.00 ± 14%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
      0.22 ± 20%      -0.2        0.06 ± 83%  perf-profile.children.cycles-pp.native_apic_mem_eoi
      0.23 ± 11%      -0.1        0.15 ± 24%  perf-profile.children.cycles-pp.getenv
      0.03 ±141%      +0.1        0.10 ± 29%  perf-profile.children.cycles-pp.set_task_cpu
      0.01 ±223%      +0.1        0.08 ± 37%  perf-profile.children.cycles-pp.__radix_tree_lookup
      0.00            +0.1        0.10 ± 43%  perf-profile.children.cycles-pp.kmalloc_trace
      0.01 ±223%      +0.1        0.12 ± 37%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.16 ± 33%      +0.1        0.29 ± 29%  perf-profile.children.cycles-pp.vm_area_alloc
      0.10 ± 79%      +0.1        0.24 ± 26%  perf-profile.children.cycles-pp.leave_mm
      0.24 ± 19%      +0.2        0.41 ± 36%  perf-profile.children.cycles-pp.strnlen_user
      0.41 ± 22%      +0.2        0.58 ± 19%  perf-profile.children.cycles-pp.migration_cpu_stop
      0.68 ± 12%      +0.2        0.86 ±  6%  perf-profile.children.cycles-pp.ksys_write
      0.65 ± 15%      +0.2        0.84 ±  6%  perf-profile.children.cycles-pp.vfs_write
      0.41 ± 22%      +0.2        0.62 ± 19%  perf-profile.children.cycles-pp.cpu_stopper_thread
      0.79 ± 10%      +0.2        1.00 ±  3%  perf-profile.children.cycles-pp.write
      0.47 ± 28%      +0.2        0.70 ± 23%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.24 ± 35%      +0.2        0.47 ± 20%  perf-profile.children.cycles-pp.set_pte_range
      0.43 ± 28%      +0.2        0.67 ± 25%  perf-profile.children.cycles-pp.d_alloc_parallel
      0.58 ± 23%      +0.3        0.87 ± 21%  perf-profile.children.cycles-pp.__lookup_slow
      0.98 ± 22%      +0.3        1.27 ± 13%  perf-profile.children.cycles-pp.copy_process
      1.39 ±  9%      +0.3        1.74 ± 13%  perf-profile.children.cycles-pp.filemap_map_pages
      1.49 ±  9%      +0.4        1.91 ± 11%  perf-profile.children.cycles-pp.do_read_fault
      1.75 ± 10%      +0.5        2.26 ±  8%  perf-profile.children.cycles-pp.do_fault
      2.66 ± 14%      +0.6        3.28 ±  9%  perf-profile.children.cycles-pp.smpboot_thread_fn
      3.94 ± 16%      +0.7        4.62 ±  8%  perf-profile.children.cycles-pp.read
      4.09 ±  4%      +0.8        4.93 ±  8%  perf-profile.children.cycles-pp.asm_exc_page_fault
      3.08 ± 10%      +0.9        3.96 ±  8%  perf-profile.children.cycles-pp.__handle_mm_fault
      3.22 ±  8%      +1.0        4.18 ±  8%  perf-profile.children.cycles-pp.handle_mm_fault
      3.44 ±  6%      +1.0        4.47 ±  9%  perf-profile.children.cycles-pp.do_user_addr_fault
      3.45 ±  5%      +1.0        4.48 ±  9%  perf-profile.children.cycles-pp.exc_page_fault
     20.31 ±  9%      +2.6       22.93 ±  6%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     20.26 ±  9%      +2.6       22.88 ±  6%  perf-profile.children.cycles-pp.do_syscall_64
      0.21 ± 20%      -0.2        0.06 ± 83%  perf-profile.self.cycles-pp.native_apic_mem_eoi
      0.12 ± 30%      +0.1        0.18 ± 19%  perf-profile.self.cycles-pp.newidle_balance
      0.01 ±223%      +0.1        0.08 ± 37%  perf-profile.self.cycles-pp.__radix_tree_lookup
      0.00            +0.1        0.09 ± 39%  perf-profile.self.cycles-pp.kmalloc_trace
      0.05 ±111%      +0.1        0.17 ± 36%  perf-profile.self.cycles-pp.leave_mm
      0.23 ± 23%      +0.2        0.39 ± 40%  perf-profile.self.cycles-pp.strnlen_user
      0.10 ± 53%      +0.2        0.30 ± 59%  perf-profile.self.cycles-pp.read_counters




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux