[linus:master] [workqueue] 636b927eba: stress-ng.io.ops_per_sec 19.5% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 19.5% improvement of stress-ng.io.ops_per_sec on:


commit: 636b927eba5bc633753f8eb80f35e1d5be806e51 ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz (Skylake) with 32G memory
parameters:

	nr_threads: 10%
	disk: 1SSD
	testtime: 60s
	fs: xfs
	class: filesystem
	test: io
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230922/202309221737.2ee51a68-oliver.sang@xxxxxxxxx

=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  filesystem/gcc-12/performance/1SSD/xfs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-skl-d08/io/stress-ng/60s

commit: 
  4cbfd3de73 ("workqueue: Call wq_update_unbound_numa() on all CPUs in NUMA node on CPU hotplug")
  636b927eba ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues")

4cbfd3de737b9d00 636b927eba5bc633753f8eb80f3 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      1.53 ±  2%      +0.3        1.82 ±  3%  mpstat.cpu.all.usr%
      0.04 ± 25%     -58.2%       0.02 ± 43%  perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork
      7.29            -2.7%       7.09        iostat.cpu.system
      1.52 ±  2%     +18.2%       1.80 ±  3%  iostat.cpu.user
     58.72 ± 46%     -63.9%      21.18 ± 50%  sched_debug.cfs_rq:/.removed.load_avg.avg
    205.63 ± 27%     -47.3%     108.41 ± 52%  sched_debug.cfs_rq:/.removed.load_avg.stddev
      0.13 ±  3%     +13.8%       0.15 ±  4%  turbostat.IPC
     82.74            +1.5%      83.95        turbostat.PkgWatt
   2954572           +19.5%    3529576 ±  4%  stress-ng.io.ops
     49242           +19.5%      58826 ±  4%  stress-ng.io.ops_per_sec
    151.67            -3.8%     145.86        stress-ng.time.system_time
     27.02           +21.6%      32.86 ±  4%  stress-ng.time.user_time
 1.017e+09           +21.7%  1.238e+09 ±  3%  perf-stat.i.branch-instructions
      2.07            -0.4        1.71 ±  3%  perf-stat.i.branch-miss-rate%
  1.42e+08           +20.9%  1.717e+08 ±  2%  perf-stat.i.cache-references
      2.54           -17.6%       2.09 ±  4%  perf-stat.i.cpi
      0.13            -0.0        0.12        perf-stat.i.dTLB-load-miss-rate%
   1359466           +19.1%    1618588 ±  4%  perf-stat.i.dTLB-load-misses
 1.134e+09           +19.6%  1.356e+09 ±  3%  perf-stat.i.dTLB-loads
      0.00 ±  7%      -0.0        0.00 ±  3%  perf-stat.i.dTLB-store-miss-rate%
 5.421e+08           +19.6%  6.483e+08 ±  3%  perf-stat.i.dTLB-stores
     63.26 ±  4%      +6.1       69.35 ±  2%  perf-stat.i.iTLB-load-miss-rate%
  5.08e+09           +20.7%  6.131e+09 ±  3%  perf-stat.i.instructions
      0.42           +19.3%       0.50 ±  3%  perf-stat.i.ipc
     78.71           +20.4%      94.79 ±  3%  perf-stat.i.metric.M/sec
      2.23            -0.4        1.85 ±  3%  perf-stat.overall.branch-miss-rate%
      0.33 ±  4%      -0.1        0.28 ±  6%  perf-stat.overall.cache-miss-rate%
      2.44           -16.8%       2.03 ±  3%  perf-stat.overall.cpi
      0.00 ±  4%      -0.0        0.00 ±  3%  perf-stat.overall.dTLB-store-miss-rate%
     62.97 ±  5%      +7.3       70.29 ±  3%  perf-stat.overall.iTLB-load-miss-rate%
      0.41           +20.4%       0.49 ±  3%  perf-stat.overall.ipc
 1.001e+09           +21.7%  1.218e+09 ±  3%  perf-stat.ps.branch-instructions
 1.398e+08           +20.9%   1.69e+08 ±  2%  perf-stat.ps.cache-references
   1337922           +19.1%    1592892 ±  4%  perf-stat.ps.dTLB-load-misses
 1.116e+09           +19.6%  1.334e+09 ±  3%  perf-stat.ps.dTLB-loads
 5.335e+08           +19.6%   6.38e+08 ±  3%  perf-stat.ps.dTLB-stores
 4.999e+09           +20.7%  6.033e+09 ±  3%  perf-stat.ps.instructions
 3.167e+11           +20.4%  3.811e+11 ±  3%  perf-stat.total.instructions
     21.48 ±  3%      -7.5       13.96 ± 12%  perf-profile.calltrace.cycles-pp._raw_spin_lock.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64
     18.33 ±  3%      -7.0       11.30 ± 14%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.iterate_supers.ksys_sync.__x64_sys_sync
     35.16 ±  3%      -6.9       28.21 ±  6%  perf-profile.calltrace.cycles-pp.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe
     36.20 ±  3%      -6.9       29.35 ±  6%  perf-profile.calltrace.cycles-pp.ksys_sync.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe.sync
     36.20 ±  3%      -6.9       29.35 ±  6%  perf-profile.calltrace.cycles-pp.__x64_sys_sync.do_syscall_64.entry_SYSCALL_64_after_hwframe.sync
     36.44 ±  3%      -6.8       29.59 ±  6%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sync
     36.54 ±  3%      -6.8       29.71 ±  6%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sync
     36.86 ±  3%      -6.8       30.08 ±  6%  perf-profile.calltrace.cycles-pp.sync
     29.64 ±  8%      -5.1       24.54 ± 15%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     29.64 ±  8%      -5.1       24.54 ± 15%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
     29.63 ±  8%      -5.1       24.54 ± 15%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     29.45 ±  8%      -5.1       24.37 ± 15%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     29.07 ±  8%      -5.1       24.00 ± 15%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
     29.78 ±  8%      -4.1       25.66 ±  7%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
     28.89 ±  8%      -4.1       24.82 ±  7%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
     26.19 ±  8%      -4.0       22.20 ±  8%  perf-profile.calltrace.cycles-pp.intel_idle_ibrs.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      9.73 ±  3%      -1.8        7.94 ±  4%  perf-profile.calltrace.cycles-pp.down_read.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64
      1.34 ±  5%      +0.2        1.50 ±  4%  perf-profile.calltrace.cycles-pp._find_next_bit.get_nr_inodes.get_nr_dirty_inodes.writeback_inodes_sb.sync_filesystem
      2.11 ±  5%      +0.2        2.32 ±  6%  perf-profile.calltrace.cycles-pp.__entry_text_start.syncfs
      1.14 ±  7%      +0.2        1.36 ±  8%  perf-profile.calltrace.cycles-pp.up_read.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64
      2.48 ±  6%      +0.4        2.84 ±  4%  perf-profile.calltrace.cycles-pp.down_read.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
      4.64 ±  5%      +0.6        5.23 ±  5%  perf-profile.calltrace.cycles-pp.get_nr_inodes.get_nr_dirty_inodes.writeback_inodes_sb.sync_filesystem.__x64_sys_syncfs
      4.99 ±  6%      +0.6        5.58 ±  3%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syncfs
      4.33 ±  4%      +0.6        4.97 ±  5%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
      9.99 ±  5%      +1.2       11.21 ±  4%  perf-profile.calltrace.cycles-pp.get_nr_dirty_inodes.writeback_inodes_sb.sync_filesystem.__x64_sys_syncfs.do_syscall_64
     10.31 ±  5%      +1.3       11.56 ±  4%  perf-profile.calltrace.cycles-pp.writeback_inodes_sb.sync_filesystem.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.3        1.35 ±  8%  perf-profile.calltrace.cycles-pp.mutex_spin_on_owner.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq
      0.00            +1.8        1.78 ±  8%  perf-profile.calltrace.cycles-pp.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force
      0.00            +2.6        2.64 ±  5%  perf-profile.calltrace.cycles-pp.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs.iterate_supers
      0.00            +2.7        2.66 ±  5%  perf-profile.calltrace.cycles-pp.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs.iterate_supers.ksys_sync
      0.00            +2.7        2.69 ±  5%  perf-profile.calltrace.cycles-pp.xfs_log_force.xfs_fs_sync_fs.iterate_supers.ksys_sync.__x64_sys_sync
      0.00            +2.7        2.70 ±  5%  perf-profile.calltrace.cycles-pp.xfs_fs_sync_fs.iterate_supers.ksys_sync.__x64_sys_sync.do_syscall_64
      0.00            +6.5        6.49 ±  7%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.flush_workqueue_prep_pwqs.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq
      0.66 ±  9%      +7.0        7.63 ±  6%  perf-profile.calltrace.cycles-pp.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs.sync_filesystem.__x64_sys_syncfs
      0.62 ± 10%      +7.0        7.59 ±  6%  perf-profile.calltrace.cycles-pp.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs.sync_filesystem
      0.70 ±  9%      +7.0        7.68 ±  6%  perf-profile.calltrace.cycles-pp.xfs_log_force.xfs_fs_sync_fs.sync_filesystem.__x64_sys_syncfs.do_syscall_64
      0.77 ±  8%      +7.0        7.76 ±  6%  perf-profile.calltrace.cycles-pp.xfs_fs_sync_fs.sync_filesystem.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +7.5        7.46 ±  6%  perf-profile.calltrace.cycles-pp.flush_workqueue_prep_pwqs.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force
     12.60 ±  5%      +8.5       21.12 ±  5%  perf-profile.calltrace.cycles-pp.sync_filesystem.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
     16.57 ±  5%      +9.2       25.72 ±  4%  perf-profile.calltrace.cycles-pp.__x64_sys_syncfs.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
      0.50 ± 45%      +9.7       10.18 ±  6%  perf-profile.calltrace.cycles-pp.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force.xfs_fs_sync_fs
     21.43 ±  5%      +9.8       31.22 ±  4%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syncfs
     24.17 ±  5%     +10.1       34.30 ±  4%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syncfs
     31.94 ±  5%     +11.0       42.94 ±  4%  perf-profile.calltrace.cycles-pp.syncfs
     22.38 ±  3%      -7.5       14.88 ± 11%  perf-profile.children.cycles-pp._raw_spin_lock
     18.34 ±  3%      -7.0       11.30 ± 14%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     35.22 ±  3%      -6.9       28.30 ±  6%  perf-profile.children.cycles-pp.iterate_supers
     36.20 ±  3%      -6.9       29.35 ±  6%  perf-profile.children.cycles-pp.__x64_sys_sync
     36.20 ±  3%      -6.9       29.35 ±  6%  perf-profile.children.cycles-pp.ksys_sync
     36.88 ±  3%      -6.8       30.10 ±  6%  perf-profile.children.cycles-pp.sync
     29.64 ±  8%      -5.1       24.54 ± 15%  perf-profile.children.cycles-pp.start_secondary
     29.78 ±  8%      -4.1       25.66 ±  7%  perf-profile.children.cycles-pp.do_idle
     29.78 ±  8%      -4.1       25.66 ±  7%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     29.78 ±  8%      -4.1       25.66 ±  7%  perf-profile.children.cycles-pp.cpu_startup_entry
     29.20 ±  8%      -4.1       25.10 ±  7%  perf-profile.children.cycles-pp.cpuidle_enter
     29.59 ±  8%      -4.1       25.50 ±  7%  perf-profile.children.cycles-pp.cpuidle_idle_call
     29.19 ±  8%      -4.1       25.10 ±  7%  perf-profile.children.cycles-pp.cpuidle_enter_state
     26.26 ±  8%      -4.0       22.28 ±  8%  perf-profile.children.cycles-pp.intel_idle_ibrs
     12.26 ±  3%      -1.4       10.84 ±  3%  perf-profile.children.cycles-pp.down_read
      1.89 ± 12%      -0.3        1.62 ±  6%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.23 ±  7%      -0.1        0.18 ± 11%  perf-profile.children.cycles-pp.ktime_get
      0.09 ± 10%      +0.1        0.14 ± 11%  perf-profile.children.cycles-pp.up_write
      0.16 ± 14%      +0.1        0.22 ± 10%  perf-profile.children.cycles-pp.sync_fs_one_sb
      0.36 ±  7%      +0.1        0.44 ±  4%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.47 ±  7%      +0.1        0.55 ±  5%  perf-profile.children.cycles-pp.__fget_light
      0.38 ± 11%      +0.1        0.47 ±  6%  perf-profile.children.cycles-pp.mutex_lock
      0.46 ±  9%      +0.1        0.56 ±  6%  perf-profile.children.cycles-pp.__cond_resched
      1.24 ±  5%      +0.2        1.44 ±  5%  perf-profile.children.cycles-pp.sync_inodes_sb
      0.00            +0.2        0.22 ± 13%  perf-profile.children.cycles-pp.osq_lock
      0.44 ± 10%      +0.2        0.66 ±  7%  perf-profile.children.cycles-pp.mutex_unlock
      2.51 ±  6%      +0.3        2.77 ±  6%  perf-profile.children.cycles-pp.__entry_text_start
      1.46 ±  8%      +0.3        1.81 ±  6%  perf-profile.children.cycles-pp.up_read
      4.66 ±  4%      +0.6        5.28 ±  5%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      4.92 ±  5%      +0.6        5.55 ±  4%  perf-profile.children.cycles-pp.get_nr_inodes
     10.29 ±  5%      +1.2       11.54 ±  4%  perf-profile.children.cycles-pp.get_nr_dirty_inodes
     10.32 ±  5%      +1.3       11.58 ±  4%  perf-profile.children.cycles-pp.writeback_inodes_sb
      0.00            +1.6        1.62 ±  7%  perf-profile.children.cycles-pp.mutex_spin_on_owner
      0.00            +2.1        2.12 ±  7%  perf-profile.children.cycles-pp.__mutex_lock
      0.25 ± 11%      +6.4        6.63 ±  7%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.49 ±  9%      +7.0        7.50 ±  6%  perf-profile.children.cycles-pp.flush_workqueue_prep_pwqs
     12.65 ±  5%      +8.5       21.16 ±  5%  perf-profile.children.cycles-pp.sync_filesystem
     16.61 ±  5%      +9.2       25.76 ±  4%  perf-profile.children.cycles-pp.__x64_sys_syncfs
      0.86 ± 10%      +9.3       10.19 ±  6%  perf-profile.children.cycles-pp.__flush_workqueue
      0.91 ±  8%      +9.3       10.24 ±  6%  perf-profile.children.cycles-pp.xlog_cil_push_now
      0.97 ±  8%      +9.3       10.30 ±  6%  perf-profile.children.cycles-pp.xlog_cil_force_seq
      1.11 ±  8%      +9.4       10.46 ±  6%  perf-profile.children.cycles-pp.xfs_fs_sync_fs
      1.02 ±  9%      +9.4       10.38 ±  6%  perf-profile.children.cycles-pp.xfs_log_force
     32.30 ±  5%     +11.1       43.37 ±  4%  perf-profile.children.cycles-pp.syncfs
     18.22 ±  3%      -7.0       11.25 ± 14%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
     26.26 ±  8%      -4.0       22.27 ±  8%  perf-profile.self.cycles-pp.intel_idle_ibrs
     11.86 ±  3%      -1.5       10.36 ±  3%  perf-profile.self.cycles-pp.down_read
      3.99 ±  2%      -0.5        3.53 ±  4%  perf-profile.self.cycles-pp._raw_spin_lock
      1.62 ±  6%      -0.4        1.27 ±  9%  perf-profile.self.cycles-pp.iterate_supers
      0.12 ± 12%      -0.0        0.09 ± 13%  perf-profile.self.cycles-pp.ktime_get
      0.02 ±141%      +0.0        0.06 ± 13%  perf-profile.self.cycles-pp.writeback_inodes_sb
      0.08 ± 13%      +0.1        0.14 ± 12%  perf-profile.self.cycles-pp.up_write
      0.16 ± 14%      +0.1        0.22 ± 10%  perf-profile.self.cycles-pp.sync_fs_one_sb
      0.39 ±  7%      +0.1        0.46 ±  7%  perf-profile.self.cycles-pp.syncfs
      0.31 ±  8%      +0.1        0.38 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.46 ±  7%      +0.1        0.54 ±  6%  perf-profile.self.cycles-pp.__fget_light
      0.34 ± 10%      +0.1        0.43 ±  6%  perf-profile.self.cycles-pp.mutex_lock
      0.30 ± 13%      +0.1        0.39 ±  7%  perf-profile.self.cycles-pp.__cond_resched
      0.32 ±  7%      +0.1        0.43 ±  6%  perf-profile.self.cycles-pp.sync_filesystem
      0.00            +0.2        0.22 ± 13%  perf-profile.self.cycles-pp.osq_lock
      0.43 ± 10%      +0.2        0.65 ±  7%  perf-profile.self.cycles-pp.mutex_unlock
      0.00            +0.2        0.24 ±  8%  perf-profile.self.cycles-pp.__mutex_lock
      1.40 ±  7%      +0.3        1.72 ±  6%  perf-profile.self.cycles-pp.up_read
      2.99 ±  5%      +0.4        3.36 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      4.21 ±  5%      +0.5        4.71 ±  5%  perf-profile.self.cycles-pp.get_nr_dirty_inodes
      3.82 ±  4%      +0.5        4.33 ±  5%  perf-profile.self.cycles-pp.get_nr_inodes
      4.48 ±  3%      +0.6        5.11 ±  6%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.26 ±  6%      +0.6        0.91 ±  7%  perf-profile.self.cycles-pp.flush_workqueue_prep_pwqs
      0.00            +1.6        1.61 ±  7%  perf-profile.self.cycles-pp.mutex_spin_on_owner
      0.24 ± 11%      +6.3        6.58 ±  7%  perf-profile.self.cycles-pp._raw_spin_lock_irq




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux