Hello, kernel test robot noticed a 11.8% improvement of aim9.disk_src.ops_per_sec on: commit: a616bc666748063733c62e15ea417a90772a40e0 ("libfs: Convert simple directory offsets to use a Maple Tree") git://git.kernel.org/cgit/linux/kernel/git/cel/linux simple-offset-maple testcase: aim9 test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory parameters: testtime: 300s test: disk_src cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240219/202402191308.8e7ee8c7-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/disk_src/aim9/300s commit: f3f24869a1 ("test_maple_tree: testing the cyclic allocation") a616bc6667 ("libfs: Convert simple directory offsets to use a Maple Tree") f3f24869a1d7cde1 a616bc666748063733c62e15ea4 ---------------- --------------------------- %stddev %change %stddev \ | \ 0.34 ± 4% -0.1 0.20 ± 4% mpstat.cpu.all.soft% 0.00 ± 28% +58.3% 0.00 ± 17% perf-sched.sch_delay.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm 1464 ± 2% +14.0% 1668 ± 4% vmstat.system.cs 164231 +11.8% 183678 aim9.disk_src.ops_per_sec 1309 ± 15% +2643.5% 35915 ± 23% aim9.time.involuntary_context_switches 91.00 +5.5% 96.00 aim9.time.percent_of_cpu_this_job_got 212.54 +3.5% 220.06 aim9.time.system_time 62.58 +10.2% 68.94 aim9.time.user_time 21685 -7.1% 20144 proc-vmstat.nr_slab_reclaimable 6611541 -88.6% 750673 ± 7% proc-vmstat.numa_hit 6561447 -89.3% 700947 ± 7% proc-vmstat.numa_local 5747 +3.7% 5960 proc-vmstat.pgactivate 26113963 -93.7% 1648373 ± 17% proc-vmstat.pgalloc_normal 26042963 -93.7% 1628178 ± 18% proc-vmstat.pgfree 2.07 -1.2% 2.04 perf-stat.i.MPKI 6.738e+08 +3.0% 6.94e+08 perf-stat.i.branch-instructions 2.94 -0.2 2.70 perf-stat.i.branch-miss-rate% 20408670 -5.1% 19363031 perf-stat.i.branch-misses 15.11 +2.7 17.77 perf-stat.i.cache-miss-rate% 46824224 -14.7% 39962840 perf-stat.i.cache-references 1419 ± 2% +14.4% 1623 ± 5% perf-stat.i.context-switches 1.88 -1.3% 1.85 perf-stat.i.cpi 9.453e+08 +2.2% 9.659e+08 perf-stat.i.dTLB-loads 0.22 ± 5% +0.0 0.25 ± 3% perf-stat.i.dTLB-store-miss-rate% 8.8e+08 -6.8% 8.205e+08 perf-stat.i.dTLB-stores 1536484 +7.9% 1657233 perf-stat.i.iTLB-load-misses 2279 -6.0% 2142 perf-stat.i.instructions-per-iTLB-miss 0.54 +1.3% 0.54 perf-stat.i.ipc 786.95 +7.1% 843.12 perf-stat.i.metric.K/sec 47.07 +1.1 48.17 perf-stat.i.node-load-miss-rate% 87561 ± 4% +17.2% 102647 ± 6% perf-stat.i.node-load-misses 2.01 -1.2% 1.99 perf-stat.overall.MPKI 3.03 -0.2 2.79 perf-stat.overall.branch-miss-rate% 15.07 +2.6 17.67 perf-stat.overall.cache-miss-rate% 1.84 -1.2% 1.82 perf-stat.overall.cpi 0.22 ± 5% +0.0 0.24 ± 3% perf-stat.overall.dTLB-store-miss-rate% 2283 -6.1% 2144 perf-stat.overall.instructions-per-iTLB-miss 0.54 +1.2% 0.55 perf-stat.overall.ipc 44.15 +1.8 45.93 perf-stat.overall.node-load-miss-rate% 6.715e+08 +3.0% 6.917e+08 perf-stat.ps.branch-instructions 20340341 -5.1% 19299968 perf-stat.ps.branch-misses 46667379 -14.7% 39829580 perf-stat.ps.cache-references 1414 ± 2% +14.4% 1618 ± 5% perf-stat.ps.context-switches 9.421e+08 +2.2% 9.627e+08 perf-stat.ps.dTLB-loads 8.771e+08 -6.8% 8.178e+08 perf-stat.ps.dTLB-stores 1531338 +7.9% 1651678 perf-stat.ps.iTLB-load-misses 87275 ± 4% +17.3% 102341 ± 6% perf-stat.ps.node-load-misses 5.62 ± 13% -1.9 3.69 ± 12% perf-profile.calltrace.cycles-pp.shmem_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open 7.87 ± 13% -1.9 5.95 ± 11% perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2 8.47 ± 13% -1.9 6.59 ± 10% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat 2.97 ± 12% -1.8 1.16 ± 13% perf-profile.calltrace.cycles-pp.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups.path_openat 0.00 +1.0 0.98 ± 13% perf-profile.calltrace.cycles-pp.mas_alloc_cyclic.mtree_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open 0.00 +1.0 1.00 ± 40% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn 0.00 +1.0 1.03 ± 40% perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread 0.00 +1.1 1.06 ± 40% perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork 0.00 +1.1 1.06 ± 40% perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.00 +1.1 1.10 ± 39% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.00 +1.1 1.10 ± 14% perf-profile.calltrace.cycles-pp.mtree_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups 0.00 +1.2 1.20 ± 13% perf-profile.calltrace.cycles-pp.mas_erase.mtree_erase.simple_offset_remove.shmem_unlink.vfs_unlink 0.00 +1.3 1.27 ± 38% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm 0.00 +1.3 1.27 ± 38% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm 0.00 +1.3 1.27 ± 38% perf-profile.calltrace.cycles-pp.ret_from_fork_asm 0.00 +1.4 1.35 ± 12% perf-profile.calltrace.cycles-pp.mtree_erase.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat 15.22 ± 8% -2.8 12.40 ± 8% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 14.50 ± 8% -2.8 11.72 ± 8% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 4.73 ± 13% -2.8 1.97 ± 15% perf-profile.children.cycles-pp.irq_exit_rcu 3.50 ± 12% -2.1 1.41 ± 12% perf-profile.children.cycles-pp.kmem_cache_alloc_lru 5.63 ± 13% -1.9 3.70 ± 12% perf-profile.children.cycles-pp.shmem_mknod 7.88 ± 13% -1.9 5.97 ± 11% perf-profile.children.cycles-pp.lookup_open 8.49 ± 13% -1.9 6.62 ± 10% perf-profile.children.cycles-pp.open_last_lookups 2.97 ± 12% -1.8 1.16 ± 13% perf-profile.children.cycles-pp.simple_offset_add 2.90 ± 22% -1.8 1.15 ± 41% perf-profile.children.cycles-pp.rcu_do_batch 4.47 ± 14% -1.7 2.76 ± 24% perf-profile.children.cycles-pp.__do_softirq 1.85 ± 15% -1.7 0.14 ± 28% perf-profile.children.cycles-pp.___slab_alloc 3.00 ± 22% -1.7 1.34 ± 38% perf-profile.children.cycles-pp.rcu_core 1.66 ± 15% -1.6 0.05 ± 68% perf-profile.children.cycles-pp.allocate_slab 0.92 ± 18% -0.6 0.31 ± 19% perf-profile.children.cycles-pp.__call_rcu_common 0.88 ± 27% -0.6 0.31 ± 43% perf-profile.children.cycles-pp.__slab_free 0.28 ± 15% -0.2 0.12 ± 25% perf-profile.children.cycles-pp.xas_load 0.20 ± 18% -0.1 0.08 ± 30% perf-profile.children.cycles-pp.rcu_segcblist_enqueue 0.12 ± 30% -0.1 0.05 ± 65% perf-profile.children.cycles-pp.rcu_nocb_try_bypass 0.00 +0.1 0.10 ± 27% perf-profile.children.cycles-pp.mas_wr_end_piv 0.00 +0.2 0.17 ± 22% perf-profile.children.cycles-pp.mas_leaf_max_gap 0.00 +0.2 0.18 ± 24% perf-profile.children.cycles-pp.mtree_range_walk 0.00 +0.2 0.24 ± 22% perf-profile.children.cycles-pp.mas_anode_descend 0.00 +0.3 0.29 ± 16% perf-profile.children.cycles-pp.mas_wr_walk 0.00 +0.3 0.31 ± 23% perf-profile.children.cycles-pp.mas_update_gap 0.00 +0.3 0.32 ± 17% perf-profile.children.cycles-pp.mas_wr_append 0.00 +0.4 0.37 ± 15% perf-profile.children.cycles-pp.mas_empty_area 0.00 +0.5 0.47 ± 18% perf-profile.children.cycles-pp.mas_wr_node_store 0.00 +1.0 0.99 ± 13% perf-profile.children.cycles-pp.mas_alloc_cyclic 0.05 ± 82% +1.0 1.10 ± 39% perf-profile.children.cycles-pp.smpboot_thread_fn 0.01 ±264% +1.0 1.06 ± 40% perf-profile.children.cycles-pp.run_ksoftirqd 0.22 ± 36% +1.1 1.28 ± 38% perf-profile.children.cycles-pp.ret_from_fork 0.22 ± 36% +1.1 1.28 ± 38% perf-profile.children.cycles-pp.ret_from_fork_asm 0.21 ± 38% +1.1 1.27 ± 38% perf-profile.children.cycles-pp.kthread 0.00 +1.1 1.11 ± 14% perf-profile.children.cycles-pp.mtree_alloc_cyclic 0.00 +1.2 1.21 ± 14% perf-profile.children.cycles-pp.mas_erase 0.00 +1.4 1.35 ± 12% perf-profile.children.cycles-pp.mtree_erase 0.87 ± 27% -0.6 0.31 ± 42% perf-profile.self.cycles-pp.__slab_free 0.53 ± 19% -0.4 0.18 ± 23% perf-profile.self.cycles-pp.__call_rcu_common 0.57 ± 10% -0.3 0.26 ± 21% perf-profile.self.cycles-pp.kmem_cache_alloc_lru 0.89 ± 14% -0.3 0.59 ± 15% perf-profile.self.cycles-pp.kmem_cache_free 0.19 ± 21% -0.1 0.06 ± 65% perf-profile.self.cycles-pp.rcu_segcblist_enqueue 0.10 ± 20% -0.1 0.04 ± 81% perf-profile.self.cycles-pp.xas_load 0.08 ± 19% -0.0 0.04 ± 61% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt 0.00 +0.1 0.09 ± 30% perf-profile.self.cycles-pp.mtree_erase 0.00 +0.1 0.10 ± 26% perf-profile.self.cycles-pp.mtree_alloc_cyclic 0.00 +0.1 0.10 ± 27% perf-profile.self.cycles-pp.mas_wr_end_piv 0.00 +0.1 0.12 ± 38% perf-profile.self.cycles-pp.mas_empty_area 0.00 +0.1 0.14 ± 38% perf-profile.self.cycles-pp.mas_update_gap 0.00 +0.1 0.14 ± 20% perf-profile.self.cycles-pp.mas_wr_append 0.00 +0.2 0.16 ± 23% perf-profile.self.cycles-pp.mas_leaf_max_gap 0.00 +0.2 0.18 ± 24% perf-profile.self.cycles-pp.mtree_range_walk 0.00 +0.2 0.18 ± 29% perf-profile.self.cycles-pp.mas_alloc_cyclic 0.00 +0.2 0.22 ± 32% perf-profile.self.cycles-pp.mas_erase 0.00 +0.2 0.24 ± 22% perf-profile.self.cycles-pp.mas_anode_descend 0.00 +0.3 0.27 ± 16% perf-profile.self.cycles-pp.mas_wr_walk 0.00 +0.3 0.34 ± 20% perf-profile.self.cycles-pp.mas_wr_node_store Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki