[tytso-ext4:dev] [jbd2] 7c73ddb758: stress-ng.fiemap.ops_per_sec 565.3% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 565.3% improvement of stress-ng.fiemap.ops_per_sec on:


commit: 7c73ddb7589fb8ddb1136b6306dfb72089c81511 ("jbd2: speed up jbd2_transaction_committed()")
https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git dev

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	disk: 1HDD
	testtime: 60s
	fs: ext4
	test: fiemap
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240714/202407142212.5595ea54-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/1HDD/ext4/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/fiemap/stress-ng/60s

commit: 
  8262fe9a90 ("ext4: make ext4_da_map_blocks() buffer_head unaware")
  7c73ddb758 ("jbd2: speed up jbd2_transaction_committed()")

8262fe9a902c8a7b 7c73ddb7589fb8ddb1136b6306d 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 1.651e+08 ± 27%     -31.9%  1.125e+08 ±  6%  cpuidle..time
      1.15 ± 10%     +39.7%       1.61 ±  2%  iostat.cpu.user
    364499 ± 18%     +93.2%     704285 ± 21%  numa-numastat.node1.local_node
    391444 ± 13%     +87.0%     731983 ± 19%  numa-numastat.node1.numa_hit
      0.01 ± 93%      +0.0        0.04 ± 42%  mpstat.cpu.all.iowait%
      0.02 ± 11%      -0.0        0.01 ± 10%  mpstat.cpu.all.soft%
      1.17 ± 10%      +0.5        1.63 ±  2%  mpstat.cpu.all.usr%
      7.49 ± 26%    +150.8%      18.78 ±  5%  vmstat.procs.b
    206521 ± 17%    +533.8%    1309000 ±  2%  vmstat.system.cs
    161690 ±  3%      +9.7%     177423        vmstat.system.in
    295.83 ± 41%     +89.5%     560.50 ± 13%  perf-c2c.DRAM.local
      2738 ± 54%    +302.1%      11011 ± 22%  perf-c2c.DRAM.remote
     19455 ± 26%    +159.8%      50553 ±  3%  perf-c2c.HITM.local
      2088 ± 64%    +341.3%       9214 ± 25%  perf-c2c.HITM.remote
     21543 ± 28%    +177.4%      59768 ±  4%  perf-c2c.HITM.total
   7686439 ± 19%    +567.4%   51297323 ±  2%  stress-ng.fiemap.ops
    127116 ± 19%    +565.3%     845744 ±  2%  stress-ng.fiemap.ops_per_sec
  13124477 ± 16%    +538.2%   83760706 ±  2%  stress-ng.time.involuntary_context_switches
     16.98 ±  4%    +123.7%      37.98 ±  2%  stress-ng.time.user_time
     68650 ±  2%     +21.2%      83171 ±  6%  stress-ng.time.voluntary_context_switches
   3772338           +32.7%    5006703        meminfo.Cached
   3979639           +32.0%    5253874        meminfo.Committed_AS
   1184714 ± 10%     +70.1%    2014850 ± 15%  meminfo.Inactive
   1149048 ± 10%     +72.2%    1978594 ± 15%  meminfo.Inactive(anon)
    376153 ± 24%    +148.7%     935654 ± 15%  meminfo.Mapped
   5932787           +22.2%    7248581        meminfo.Memused
    564523 ±  7%    +218.2%    1796085        meminfo.Shmem
   5998794           +21.5%    7289549        meminfo.max_used_kB
    816342 ±130%    +203.1%    2474499 ± 40%  numa-meminfo.node0.FilePages
   1933517 ± 56%     +86.2%    3599441 ± 31%  numa-meminfo.node0.MemUsed
    205709 ± 33%    +120.9%     454343 ± 43%  numa-meminfo.node1.Active
    196984 ± 34%    +126.3%     445786 ± 45%  numa-meminfo.node1.Active(anon)
    647051 ± 24%    +120.3%    1425192 ± 29%  numa-meminfo.node1.Inactive
    632883 ± 25%    +123.0%    1411032 ± 29%  numa-meminfo.node1.Inactive(anon)
    249614 ± 23%    +145.7%     613193 ± 31%  numa-meminfo.node1.Mapped
    468647 ± 20%    +206.9%    1438272 ± 28%  numa-meminfo.node1.Shmem
    204074 ±130%    +203.1%     618583 ± 40%  numa-vmstat.node0.nr_file_pages
     48403 ± 36%    +128.8%     110725 ± 43%  numa-vmstat.node1.nr_active_anon
    158779 ± 25%    +122.3%     352949 ± 29%  numa-vmstat.node1.nr_inactive_anon
     62898 ± 23%    +143.8%     153346 ± 31%  numa-vmstat.node1.nr_mapped
    116833 ± 19%    +207.3%     359079 ± 28%  numa-vmstat.node1.nr_shmem
     48401 ± 36%    +128.8%     110724 ± 43%  numa-vmstat.node1.nr_zone_active_anon
    158780 ± 25%    +122.3%     352949 ± 29%  numa-vmstat.node1.nr_zone_inactive_anon
    389858 ± 13%     +87.4%     730598 ± 19%  numa-vmstat.node1.numa_hit
    362913 ± 18%     +93.7%     702900 ± 21%  numa-vmstat.node1.numa_local
   2712171 ±  7%     -11.3%    2404936 ±  2%  sched_debug.cfs_rq:/.avg_vruntime.avg
   1407145 ± 17%     -30.4%     979280 ± 19%  sched_debug.cfs_rq:/.load.max
   2712177 ±  7%     -11.3%    2404936 ±  2%  sched_debug.cfs_rq:/.min_vruntime.avg
    547.78 ± 36%    +106.3%       1130 ±  7%  sched_debug.cfs_rq:/.util_est.avg
      1863 ± 12%     +54.1%       2871 ± 17%  sched_debug.cfs_rq:/.util_est.max
     59.08 ±100%    +241.7%     201.92 ± 36%  sched_debug.cfs_rq:/.util_est.min
    392.55 ± 11%     +38.5%     543.51 ± 11%  sched_debug.cfs_rq:/.util_est.stddev
    104511 ± 16%    +518.1%     645974 ±  2%  sched_debug.cpu.nr_switches.avg
    204555 ± 32%    +290.2%     798142 ±  6%  sched_debug.cpu.nr_switches.max
     12171 ± 65%    +844.5%     114956 ± 47%  sched_debug.cpu.nr_switches.min
    945799           +32.6%    1254064        proc-vmstat.nr_file_pages
    287060 ± 10%     +72.7%     495669 ± 15%  proc-vmstat.nr_inactive_anon
     93971 ± 24%    +150.2%     235154 ± 15%  proc-vmstat.nr_mapped
    141268 ±  8%    +217.7%     448745        proc-vmstat.nr_shmem
     25272            +2.4%      25873        proc-vmstat.nr_slab_reclaimable
    287060 ± 10%     +72.7%     495669 ± 15%  proc-vmstat.nr_zone_inactive_anon
     24933 ± 50%    +200.6%      74949 ±  6%  proc-vmstat.numa_hint_faults
      9891 ± 50%    +317.8%      41324 ±  8%  proc-vmstat.numa_hint_faults_local
    614783 ±  2%     +72.8%    1062609        proc-vmstat.numa_hit
    548520 ±  2%     +81.6%     996296        proc-vmstat.numa_local
    549876 ±  4%     +14.4%     628860        proc-vmstat.numa_pte_updates
    734634 ±  2%     +60.7%    1180339        proc-vmstat.pgalloc_normal
    388924 ±  3%     +20.6%     468855 ±  2%  proc-vmstat.pgfault
    478242 ±  5%     -19.5%     385183 ± 14%  proc-vmstat.pgfree
 3.169e+09 ±  8%    +506.1%  1.921e+10        perf-stat.i.branch-instructions
      0.65 ±  3%      -0.1        0.53 ±  5%  perf-stat.i.branch-miss-rate%
  20491366 ±  5%    +378.8%   98121036 ±  5%  perf-stat.i.branch-misses
   7452019 ± 37%    +441.8%   40374999 ± 10%  perf-stat.i.cache-misses
  71298660 ±  3%    +361.7%  3.292e+08        perf-stat.i.cache-references
    227657 ± 19%    +498.6%    1362709 ±  2%  perf-stat.i.context-switches
     14.22 ±  8%     -83.9%       2.29        perf-stat.i.cpi
     37069 ± 36%     -84.2%       5866 ± 13%  perf-stat.i.cycles-between-cache-misses
   1.6e+10 ±  7%    +516.8%  9.867e+10        perf-stat.i.instructions
      0.08 ± 10%    +479.2%       0.44        perf-stat.i.ipc
      3.56 ± 19%    +502.1%      21.45 ±  2%  perf-stat.i.metric.K/sec
      5090 ±  4%     +25.5%       6387 ±  3%  perf-stat.i.minor-faults
      5090 ±  4%     +25.5%       6387 ±  3%  perf-stat.i.page-faults
      0.06 ± 45%    +637.9%       0.44        perf-stat.overall.ipc
 2.598e+09 ± 45%    +627.1%  1.889e+10        perf-stat.ps.branch-instructions
  17015300 ± 45%    +468.0%   96650879 ±  5%  perf-stat.ps.branch-misses
   5919193 ± 63%    +570.7%   39699549 ± 10%  perf-stat.ps.cache-misses
  58527096 ± 44%    +453.7%  3.241e+08        perf-stat.ps.cache-references
    181655 ± 48%    +640.9%    1345912 ±  2%  perf-stat.ps.context-switches
 1.311e+10 ± 45%    +640.3%  9.704e+10        perf-stat.ps.instructions
      4075 ± 44%     +53.6%       6260 ±  3%  perf-stat.ps.minor-faults
      4075 ± 44%     +53.6%       6260 ±  3%  perf-stat.ps.page-faults
 8.153e+11 ± 45%    +637.0%  6.009e+12        perf-stat.total.instructions
     85.99           -86.0        0.00        perf-profile.calltrace.cycles-pp.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter.iomap_fiemap
     86.52           -84.1        2.43        perf-profile.calltrace.cycles-pp.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl
     46.15 ± 12%     -46.2        0.00        perf-profile.calltrace.cycles-pp._raw_read_lock.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter
     96.11           -10.9       85.20        perf-profile.calltrace.cycles-pp.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl
      4.58 ± 89%      -4.6        0.00        perf-profile.calltrace.cycles-pp.queued_read_lock_slowpath.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter
     97.15            -4.5       92.65        perf-profile.calltrace.cycles-pp.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64
      4.34 ± 89%      -4.3        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_read_lock_slowpath.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report
     97.78            -0.7       97.12        perf-profile.calltrace.cycles-pp.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.17 ±141%      +0.6        0.72        perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.18 ±141%      +0.6        0.75        perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.64 ± 16%      +0.6        1.26        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.64 ± 16%      +0.6        1.26        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.29 ±100%      +0.6        0.94        perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.70 ± 14%      +0.9        1.60        perf-profile.calltrace.cycles-pp.__sched_yield
      0.00            +1.0        0.95        perf-profile.calltrace.cycles-pp.ext4_sb_block_valid.__check_block_validity.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter
      0.00            +1.0        1.05        perf-profile.calltrace.cycles-pp._copy_to_user.fiemap_fill_next_extent.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl
      0.00            +1.4        1.35 ± 11%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks
      0.00            +1.4        1.41        perf-profile.calltrace.cycles-pp.__check_block_validity.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap
      0.00            +1.6        1.58 ± 10%  perf-profile.calltrace.cycles-pp._raw_spin_lock.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report
      0.00            +3.1        3.08        perf-profile.calltrace.cycles-pp.fiemap_fill_next_extent.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64
      0.82 ±  6%      +5.1        5.90        perf-profile.calltrace.cycles-pp.iomap_iter_advance.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl
      4.18 ± 17%      +5.2        9.35 ±  3%  perf-profile.calltrace.cycles-pp._raw_read_lock.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter
      0.62 ±  7%     +52.1       52.70        perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter
      8.69 ± 13%     +68.0       76.73        perf-profile.calltrace.cycles-pp.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap
      9.30 ± 11%     +71.3       80.61        perf-profile.calltrace.cycles-pp.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl
     86.24           -85.8        0.42        perf-profile.children.cycles-pp.jbd2_transaction_committed
     86.55           -83.8        2.70        perf-profile.children.cycles-pp.ext4_set_iomap
     50.52 ± 12%     -41.1        9.45 ±  3%  perf-profile.children.cycles-pp._raw_read_lock
     96.16           -10.6       85.61        perf-profile.children.cycles-pp.ext4_iomap_begin_report
      4.60 ± 89%      -4.6        0.00        perf-profile.children.cycles-pp.queued_read_lock_slowpath
     97.20            -4.2       92.96        perf-profile.children.cycles-pp.iomap_iter
      0.09 ± 14%      +0.0        0.12 ±  4%  perf-profile.children.cycles-pp.update_process_times
      0.10 ± 13%      +0.0        0.14 ±  3%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.06 ± 15%      +0.0        0.09        perf-profile.children.cycles-pp.switch_fpu_return
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.__switch_to_asm
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.pick_eevdf
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.__switch_to
      0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.rseq_ip_fixup
      0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.00            +0.1        0.07        perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
      0.00            +0.1        0.08 ±  4%  perf-profile.children.cycles-pp.set_next_entity
      0.00            +0.1        0.09 ±  4%  perf-profile.children.cycles-pp.put_prev_entity
      0.00            +0.1        0.09        perf-profile.children.cycles-pp.update_load_avg
      0.06 ±  7%      +0.1        0.18 ±  2%  perf-profile.children.cycles-pp.do_sched_yield
      0.00            +0.1        0.11 ±  8%  perf-profile.children.cycles-pp.stress_fiemap
      0.00            +0.1        0.12 ±  3%  perf-profile.children.cycles-pp.update_curr
      0.00            +0.1        0.12 ±  3%  perf-profile.children.cycles-pp.__rseq_handle_notify_resume
      0.00            +0.1        0.12 ±  3%  perf-profile.children.cycles-pp.clear_bhb_loop
      0.20 ± 13%      +0.1        0.33        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.00            +0.1        0.13        perf-profile.children.cycles-pp.yield_task_fair
      0.07 ± 86%      +0.1        0.22 ± 19%  perf-profile.children.cycles-pp.ordered_events__queue
      0.07 ± 86%      +0.1        0.22 ± 19%  perf-profile.children.cycles-pp.queue_event
      0.00            +0.2        0.16 ±  3%  perf-profile.children.cycles-pp.ext4_inode_block_valid
      0.07 ± 87%      +0.2        0.23 ± 17%  perf-profile.children.cycles-pp.process_simple
      0.17 ± 61%      +0.2        0.38 ± 11%  perf-profile.children.cycles-pp.reader__read_event
      0.17 ± 60%      +0.2        0.39 ± 11%  perf-profile.children.cycles-pp.perf_session__process_events
      0.17 ± 60%      +0.2        0.39 ± 11%  perf-profile.children.cycles-pp.record__finish_output
      0.07 ±  5%      +0.2        0.28        perf-profile.children.cycles-pp.pick_next_task_fair
      0.47 ± 10%      +0.3        0.74        perf-profile.children.cycles-pp.__schedule
      0.48 ± 10%      +0.3        0.75        perf-profile.children.cycles-pp.schedule
      0.06 ±  7%      +0.4        0.47        perf-profile.children.cycles-pp.iomap_to_fiemap
      0.50 ± 17%      +0.4        0.94        perf-profile.children.cycles-pp.__x64_sys_sched_yield
      0.15 ±  7%      +0.9        1.00        perf-profile.children.cycles-pp.ext4_sb_block_valid
      0.71 ± 14%      +0.9        1.64        perf-profile.children.cycles-pp.__sched_yield
      0.17 ±  8%      +1.0        1.21        perf-profile.children.cycles-pp._copy_to_user
      0.23 ±  9%      +1.3        1.52        perf-profile.children.cycles-pp.__check_block_validity
      0.18 ±  9%      +1.4        1.62 ± 10%  perf-profile.children.cycles-pp._raw_spin_lock
      0.44 ±  6%      +2.7        3.15        perf-profile.children.cycles-pp.fiemap_fill_next_extent
      0.84 ±  7%      +5.2        6.02        perf-profile.children.cycles-pp.iomap_iter_advance
      0.63 ±  6%     +52.2       52.83        perf-profile.children.cycles-pp.percpu_counter_add_batch
      8.75 ± 13%     +68.4       77.12        perf-profile.children.cycles-pp.ext4_es_lookup_extent
      9.35 ± 11%     +71.6       81.00        perf-profile.children.cycles-pp.ext4_map_blocks
     50.35 ± 12%     -41.0        9.32 ±  3%  perf-profile.self.cycles-pp._raw_read_lock
     35.26 ±  7%     -35.0        0.25        perf-profile.self.cycles-pp.jbd2_transaction_committed
      0.06 ± 15%      +0.0        0.10        perf-profile.self.cycles-pp.__schedule
      0.00            +0.1        0.06 ±  6%  perf-profile.self.cycles-pp.__switch_to
      0.00            +0.1        0.07        perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
      0.00            +0.1        0.10 ±  8%  perf-profile.self.cycles-pp.stress_fiemap
      0.00            +0.1        0.10 ±  4%  perf-profile.self.cycles-pp.ext4_inode_block_valid
      0.17 ± 12%      +0.1        0.27 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock
      0.00            +0.1        0.12 ±  3%  perf-profile.self.cycles-pp.clear_bhb_loop
      0.06 ±107%      +0.2        0.21 ± 19%  perf-profile.self.cycles-pp.queue_event
      0.05 ± 46%      +0.3        0.37        perf-profile.self.cycles-pp.__check_block_validity
      0.04 ± 45%      +0.3        0.36        perf-profile.self.cycles-pp.iomap_to_fiemap
      0.14 ±  8%      +0.8        0.95        perf-profile.self.cycles-pp.ext4_sb_block_valid
      0.14 ±  7%      +0.8        0.99        perf-profile.self.cycles-pp.iomap_fiemap
      0.17 ±  8%      +1.0        1.18        perf-profile.self.cycles-pp._copy_to_user
      0.20 ±  7%      +1.2        1.44        perf-profile.self.cycles-pp.iomap_iter
      0.27 ±  7%      +1.7        1.95        perf-profile.self.cycles-pp.fiemap_fill_next_extent
      0.28 ±  7%      +1.8        2.10        perf-profile.self.cycles-pp.ext4_iomap_begin_report
      0.30 ±  7%      +1.9        2.24        perf-profile.self.cycles-pp.ext4_set_iomap
      0.40 ±  9%      +2.1        2.54        perf-profile.self.cycles-pp.ext4_map_blocks
      0.82 ±  6%      +5.1        5.91        perf-profile.self.cycles-pp.iomap_iter_advance
      3.91 ± 12%     +11.0       14.90        perf-profile.self.cycles-pp.ext4_es_lookup_extent
      0.55 ±  8%     +50.4       50.92        perf-profile.self.cycles-pp.percpu_counter_add_batch




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux