Hello, kernel test robot noticed a 565.3% improvement of stress-ng.fiemap.ops_per_sec on: commit: 7c73ddb7589fb8ddb1136b6306dfb72089c81511 ("jbd2: speed up jbd2_transaction_committed()") https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git dev testcase: stress-ng test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: nr_threads: 100% disk: 1HDD testtime: 60s fs: ext4 test: fiemap cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240714/202407142212.5595ea54-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-13/performance/1HDD/ext4/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/fiemap/stress-ng/60s commit: 8262fe9a90 ("ext4: make ext4_da_map_blocks() buffer_head unaware") 7c73ddb758 ("jbd2: speed up jbd2_transaction_committed()") 8262fe9a902c8a7b 7c73ddb7589fb8ddb1136b6306d ---------------- --------------------------- %stddev %change %stddev \ | \ 1.651e+08 ± 27% -31.9% 1.125e+08 ± 6% cpuidle..time 1.15 ± 10% +39.7% 1.61 ± 2% iostat.cpu.user 364499 ± 18% +93.2% 704285 ± 21% numa-numastat.node1.local_node 391444 ± 13% +87.0% 731983 ± 19% numa-numastat.node1.numa_hit 0.01 ± 93% +0.0 0.04 ± 42% mpstat.cpu.all.iowait% 0.02 ± 11% -0.0 0.01 ± 10% mpstat.cpu.all.soft% 1.17 ± 10% +0.5 1.63 ± 2% mpstat.cpu.all.usr% 7.49 ± 26% +150.8% 18.78 ± 5% vmstat.procs.b 206521 ± 17% +533.8% 1309000 ± 2% vmstat.system.cs 161690 ± 3% +9.7% 177423 vmstat.system.in 295.83 ± 41% +89.5% 560.50 ± 13% perf-c2c.DRAM.local 2738 ± 54% +302.1% 11011 ± 22% perf-c2c.DRAM.remote 19455 ± 26% +159.8% 50553 ± 3% perf-c2c.HITM.local 2088 ± 64% +341.3% 9214 ± 25% perf-c2c.HITM.remote 21543 ± 28% +177.4% 59768 ± 4% perf-c2c.HITM.total 7686439 ± 19% +567.4% 51297323 ± 2% stress-ng.fiemap.ops 127116 ± 19% +565.3% 845744 ± 2% stress-ng.fiemap.ops_per_sec 13124477 ± 16% +538.2% 83760706 ± 2% stress-ng.time.involuntary_context_switches 16.98 ± 4% +123.7% 37.98 ± 2% stress-ng.time.user_time 68650 ± 2% +21.2% 83171 ± 6% stress-ng.time.voluntary_context_switches 3772338 +32.7% 5006703 meminfo.Cached 3979639 +32.0% 5253874 meminfo.Committed_AS 1184714 ± 10% +70.1% 2014850 ± 15% meminfo.Inactive 1149048 ± 10% +72.2% 1978594 ± 15% meminfo.Inactive(anon) 376153 ± 24% +148.7% 935654 ± 15% meminfo.Mapped 5932787 +22.2% 7248581 meminfo.Memused 564523 ± 7% +218.2% 1796085 meminfo.Shmem 5998794 +21.5% 7289549 meminfo.max_used_kB 816342 ±130% +203.1% 2474499 ± 40% numa-meminfo.node0.FilePages 1933517 ± 56% +86.2% 3599441 ± 31% numa-meminfo.node0.MemUsed 205709 ± 33% +120.9% 454343 ± 43% numa-meminfo.node1.Active 196984 ± 34% +126.3% 445786 ± 45% numa-meminfo.node1.Active(anon) 647051 ± 24% +120.3% 1425192 ± 29% numa-meminfo.node1.Inactive 632883 ± 25% +123.0% 1411032 ± 29% numa-meminfo.node1.Inactive(anon) 249614 ± 23% +145.7% 613193 ± 31% numa-meminfo.node1.Mapped 468647 ± 20% +206.9% 1438272 ± 28% numa-meminfo.node1.Shmem 204074 ±130% +203.1% 618583 ± 40% numa-vmstat.node0.nr_file_pages 48403 ± 36% +128.8% 110725 ± 43% numa-vmstat.node1.nr_active_anon 158779 ± 25% +122.3% 352949 ± 29% numa-vmstat.node1.nr_inactive_anon 62898 ± 23% +143.8% 153346 ± 31% numa-vmstat.node1.nr_mapped 116833 ± 19% +207.3% 359079 ± 28% numa-vmstat.node1.nr_shmem 48401 ± 36% +128.8% 110724 ± 43% numa-vmstat.node1.nr_zone_active_anon 158780 ± 25% +122.3% 352949 ± 29% numa-vmstat.node1.nr_zone_inactive_anon 389858 ± 13% +87.4% 730598 ± 19% numa-vmstat.node1.numa_hit 362913 ± 18% +93.7% 702900 ± 21% numa-vmstat.node1.numa_local 2712171 ± 7% -11.3% 2404936 ± 2% sched_debug.cfs_rq:/.avg_vruntime.avg 1407145 ± 17% -30.4% 979280 ± 19% sched_debug.cfs_rq:/.load.max 2712177 ± 7% -11.3% 2404936 ± 2% sched_debug.cfs_rq:/.min_vruntime.avg 547.78 ± 36% +106.3% 1130 ± 7% sched_debug.cfs_rq:/.util_est.avg 1863 ± 12% +54.1% 2871 ± 17% sched_debug.cfs_rq:/.util_est.max 59.08 ±100% +241.7% 201.92 ± 36% sched_debug.cfs_rq:/.util_est.min 392.55 ± 11% +38.5% 543.51 ± 11% sched_debug.cfs_rq:/.util_est.stddev 104511 ± 16% +518.1% 645974 ± 2% sched_debug.cpu.nr_switches.avg 204555 ± 32% +290.2% 798142 ± 6% sched_debug.cpu.nr_switches.max 12171 ± 65% +844.5% 114956 ± 47% sched_debug.cpu.nr_switches.min 945799 +32.6% 1254064 proc-vmstat.nr_file_pages 287060 ± 10% +72.7% 495669 ± 15% proc-vmstat.nr_inactive_anon 93971 ± 24% +150.2% 235154 ± 15% proc-vmstat.nr_mapped 141268 ± 8% +217.7% 448745 proc-vmstat.nr_shmem 25272 +2.4% 25873 proc-vmstat.nr_slab_reclaimable 287060 ± 10% +72.7% 495669 ± 15% proc-vmstat.nr_zone_inactive_anon 24933 ± 50% +200.6% 74949 ± 6% proc-vmstat.numa_hint_faults 9891 ± 50% +317.8% 41324 ± 8% proc-vmstat.numa_hint_faults_local 614783 ± 2% +72.8% 1062609 proc-vmstat.numa_hit 548520 ± 2% +81.6% 996296 proc-vmstat.numa_local 549876 ± 4% +14.4% 628860 proc-vmstat.numa_pte_updates 734634 ± 2% +60.7% 1180339 proc-vmstat.pgalloc_normal 388924 ± 3% +20.6% 468855 ± 2% proc-vmstat.pgfault 478242 ± 5% -19.5% 385183 ± 14% proc-vmstat.pgfree 3.169e+09 ± 8% +506.1% 1.921e+10 perf-stat.i.branch-instructions 0.65 ± 3% -0.1 0.53 ± 5% perf-stat.i.branch-miss-rate% 20491366 ± 5% +378.8% 98121036 ± 5% perf-stat.i.branch-misses 7452019 ± 37% +441.8% 40374999 ± 10% perf-stat.i.cache-misses 71298660 ± 3% +361.7% 3.292e+08 perf-stat.i.cache-references 227657 ± 19% +498.6% 1362709 ± 2% perf-stat.i.context-switches 14.22 ± 8% -83.9% 2.29 perf-stat.i.cpi 37069 ± 36% -84.2% 5866 ± 13% perf-stat.i.cycles-between-cache-misses 1.6e+10 ± 7% +516.8% 9.867e+10 perf-stat.i.instructions 0.08 ± 10% +479.2% 0.44 perf-stat.i.ipc 3.56 ± 19% +502.1% 21.45 ± 2% perf-stat.i.metric.K/sec 5090 ± 4% +25.5% 6387 ± 3% perf-stat.i.minor-faults 5090 ± 4% +25.5% 6387 ± 3% perf-stat.i.page-faults 0.06 ± 45% +637.9% 0.44 perf-stat.overall.ipc 2.598e+09 ± 45% +627.1% 1.889e+10 perf-stat.ps.branch-instructions 17015300 ± 45% +468.0% 96650879 ± 5% perf-stat.ps.branch-misses 5919193 ± 63% +570.7% 39699549 ± 10% perf-stat.ps.cache-misses 58527096 ± 44% +453.7% 3.241e+08 perf-stat.ps.cache-references 181655 ± 48% +640.9% 1345912 ± 2% perf-stat.ps.context-switches 1.311e+10 ± 45% +640.3% 9.704e+10 perf-stat.ps.instructions 4075 ± 44% +53.6% 6260 ± 3% perf-stat.ps.minor-faults 4075 ± 44% +53.6% 6260 ± 3% perf-stat.ps.page-faults 8.153e+11 ± 45% +637.0% 6.009e+12 perf-stat.total.instructions 85.99 -86.0 0.00 perf-profile.calltrace.cycles-pp.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter.iomap_fiemap 86.52 -84.1 2.43 perf-profile.calltrace.cycles-pp.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl 46.15 ± 12% -46.2 0.00 perf-profile.calltrace.cycles-pp._raw_read_lock.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter 96.11 -10.9 85.20 perf-profile.calltrace.cycles-pp.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl 4.58 ± 89% -4.6 0.00 perf-profile.calltrace.cycles-pp.queued_read_lock_slowpath.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter 97.15 -4.5 92.65 perf-profile.calltrace.cycles-pp.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64 4.34 ± 89% -4.3 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_read_lock_slowpath.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report 97.78 -0.7 97.12 perf-profile.calltrace.cycles-pp.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.17 ±141% +0.6 0.72 perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.18 ±141% +0.6 0.75 perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield 0.64 ± 16% +0.6 1.26 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield 0.64 ± 16% +0.6 1.26 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield 0.29 ±100% +0.6 0.94 perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield 0.70 ± 14% +0.9 1.60 perf-profile.calltrace.cycles-pp.__sched_yield 0.00 +1.0 0.95 perf-profile.calltrace.cycles-pp.ext4_sb_block_valid.__check_block_validity.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter 0.00 +1.0 1.05 perf-profile.calltrace.cycles-pp._copy_to_user.fiemap_fill_next_extent.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl 0.00 +1.4 1.35 ± 11% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks 0.00 +1.4 1.41 perf-profile.calltrace.cycles-pp.__check_block_validity.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap 0.00 +1.6 1.58 ± 10% perf-profile.calltrace.cycles-pp._raw_spin_lock.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report 0.00 +3.1 3.08 perf-profile.calltrace.cycles-pp.fiemap_fill_next_extent.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64 0.82 ± 6% +5.1 5.90 perf-profile.calltrace.cycles-pp.iomap_iter_advance.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl 4.18 ± 17% +5.2 9.35 ± 3% perf-profile.calltrace.cycles-pp._raw_read_lock.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter 0.62 ± 7% +52.1 52.70 perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter 8.69 ± 13% +68.0 76.73 perf-profile.calltrace.cycles-pp.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap 9.30 ± 11% +71.3 80.61 perf-profile.calltrace.cycles-pp.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl 86.24 -85.8 0.42 perf-profile.children.cycles-pp.jbd2_transaction_committed 86.55 -83.8 2.70 perf-profile.children.cycles-pp.ext4_set_iomap 50.52 ± 12% -41.1 9.45 ± 3% perf-profile.children.cycles-pp._raw_read_lock 96.16 -10.6 85.61 perf-profile.children.cycles-pp.ext4_iomap_begin_report 4.60 ± 89% -4.6 0.00 perf-profile.children.cycles-pp.queued_read_lock_slowpath 97.20 -4.2 92.96 perf-profile.children.cycles-pp.iomap_iter 0.09 ± 14% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.update_process_times 0.10 ± 13% +0.0 0.14 ± 3% perf-profile.children.cycles-pp.tick_nohz_handler 0.06 ± 15% +0.0 0.09 perf-profile.children.cycles-pp.switch_fpu_return 0.00 +0.1 0.05 perf-profile.children.cycles-pp.__switch_to_asm 0.00 +0.1 0.05 perf-profile.children.cycles-pp.pick_eevdf 0.00 +0.1 0.05 perf-profile.children.cycles-pp.syscall_return_via_sysret 0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.__switch_to 0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.rseq_ip_fixup 0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.entry_SYSCALL_64 0.00 +0.1 0.07 perf-profile.children.cycles-pp.restore_fpregs_from_fpstate 0.00 +0.1 0.08 ± 4% perf-profile.children.cycles-pp.set_next_entity 0.00 +0.1 0.09 ± 4% perf-profile.children.cycles-pp.put_prev_entity 0.00 +0.1 0.09 perf-profile.children.cycles-pp.update_load_avg 0.06 ± 7% +0.1 0.18 ± 2% perf-profile.children.cycles-pp.do_sched_yield 0.00 +0.1 0.11 ± 8% perf-profile.children.cycles-pp.stress_fiemap 0.00 +0.1 0.12 ± 3% perf-profile.children.cycles-pp.update_curr 0.00 +0.1 0.12 ± 3% perf-profile.children.cycles-pp.__rseq_handle_notify_resume 0.00 +0.1 0.12 ± 3% perf-profile.children.cycles-pp.clear_bhb_loop 0.20 ± 13% +0.1 0.33 perf-profile.children.cycles-pp.syscall_exit_to_user_mode 0.00 +0.1 0.13 perf-profile.children.cycles-pp.yield_task_fair 0.07 ± 86% +0.1 0.22 ± 19% perf-profile.children.cycles-pp.ordered_events__queue 0.07 ± 86% +0.1 0.22 ± 19% perf-profile.children.cycles-pp.queue_event 0.00 +0.2 0.16 ± 3% perf-profile.children.cycles-pp.ext4_inode_block_valid 0.07 ± 87% +0.2 0.23 ± 17% perf-profile.children.cycles-pp.process_simple 0.17 ± 61% +0.2 0.38 ± 11% perf-profile.children.cycles-pp.reader__read_event 0.17 ± 60% +0.2 0.39 ± 11% perf-profile.children.cycles-pp.perf_session__process_events 0.17 ± 60% +0.2 0.39 ± 11% perf-profile.children.cycles-pp.record__finish_output 0.07 ± 5% +0.2 0.28 perf-profile.children.cycles-pp.pick_next_task_fair 0.47 ± 10% +0.3 0.74 perf-profile.children.cycles-pp.__schedule 0.48 ± 10% +0.3 0.75 perf-profile.children.cycles-pp.schedule 0.06 ± 7% +0.4 0.47 perf-profile.children.cycles-pp.iomap_to_fiemap 0.50 ± 17% +0.4 0.94 perf-profile.children.cycles-pp.__x64_sys_sched_yield 0.15 ± 7% +0.9 1.00 perf-profile.children.cycles-pp.ext4_sb_block_valid 0.71 ± 14% +0.9 1.64 perf-profile.children.cycles-pp.__sched_yield 0.17 ± 8% +1.0 1.21 perf-profile.children.cycles-pp._copy_to_user 0.23 ± 9% +1.3 1.52 perf-profile.children.cycles-pp.__check_block_validity 0.18 ± 9% +1.4 1.62 ± 10% perf-profile.children.cycles-pp._raw_spin_lock 0.44 ± 6% +2.7 3.15 perf-profile.children.cycles-pp.fiemap_fill_next_extent 0.84 ± 7% +5.2 6.02 perf-profile.children.cycles-pp.iomap_iter_advance 0.63 ± 6% +52.2 52.83 perf-profile.children.cycles-pp.percpu_counter_add_batch 8.75 ± 13% +68.4 77.12 perf-profile.children.cycles-pp.ext4_es_lookup_extent 9.35 ± 11% +71.6 81.00 perf-profile.children.cycles-pp.ext4_map_blocks 50.35 ± 12% -41.0 9.32 ± 3% perf-profile.self.cycles-pp._raw_read_lock 35.26 ± 7% -35.0 0.25 perf-profile.self.cycles-pp.jbd2_transaction_committed 0.06 ± 15% +0.0 0.10 perf-profile.self.cycles-pp.__schedule 0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.__switch_to 0.00 +0.1 0.07 perf-profile.self.cycles-pp.restore_fpregs_from_fpstate 0.00 +0.1 0.10 ± 8% perf-profile.self.cycles-pp.stress_fiemap 0.00 +0.1 0.10 ± 4% perf-profile.self.cycles-pp.ext4_inode_block_valid 0.17 ± 12% +0.1 0.27 ± 3% perf-profile.self.cycles-pp._raw_spin_lock 0.00 +0.1 0.12 ± 3% perf-profile.self.cycles-pp.clear_bhb_loop 0.06 ±107% +0.2 0.21 ± 19% perf-profile.self.cycles-pp.queue_event 0.05 ± 46% +0.3 0.37 perf-profile.self.cycles-pp.__check_block_validity 0.04 ± 45% +0.3 0.36 perf-profile.self.cycles-pp.iomap_to_fiemap 0.14 ± 8% +0.8 0.95 perf-profile.self.cycles-pp.ext4_sb_block_valid 0.14 ± 7% +0.8 0.99 perf-profile.self.cycles-pp.iomap_fiemap 0.17 ± 8% +1.0 1.18 perf-profile.self.cycles-pp._copy_to_user 0.20 ± 7% +1.2 1.44 perf-profile.self.cycles-pp.iomap_iter 0.27 ± 7% +1.7 1.95 perf-profile.self.cycles-pp.fiemap_fill_next_extent 0.28 ± 7% +1.8 2.10 perf-profile.self.cycles-pp.ext4_iomap_begin_report 0.30 ± 7% +1.9 2.24 perf-profile.self.cycles-pp.ext4_set_iomap 0.40 ± 9% +2.1 2.54 perf-profile.self.cycles-pp.ext4_map_blocks 0.82 ± 6% +5.1 5.91 perf-profile.self.cycles-pp.iomap_iter_advance 3.91 ± 12% +11.0 14.90 perf-profile.self.cycles-pp.ext4_es_lookup_extent 0.55 ± 8% +50.4 50.92 perf-profile.self.cycles-pp.percpu_counter_add_batch Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki