[cel:simple-offset-maple] [libfs] a616bc6667: aim9.disk_src.ops_per_sec 11.8% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 11.8% improvement of aim9.disk_src.ops_per_sec on:


commit: a616bc666748063733c62e15ea417a90772a40e0 ("libfs: Convert simple directory offsets to use a Maple Tree")
git://git.kernel.org/cgit/linux/kernel/git/cel/linux simple-offset-maple

testcase: aim9
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
parameters:

	testtime: 300s
	test: disk_src
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240219/202402191308.8e7ee8c7-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/disk_src/aim9/300s

commit: 
  f3f24869a1 ("test_maple_tree: testing the cyclic allocation")
  a616bc6667 ("libfs: Convert simple directory offsets to use a Maple Tree")

f3f24869a1d7cde1 a616bc666748063733c62e15ea4 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.34 ±  4%      -0.1        0.20 ±  4%  mpstat.cpu.all.soft%
      0.00 ± 28%     +58.3%       0.00 ± 17%  perf-sched.sch_delay.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm
      1464 ±  2%     +14.0%       1668 ±  4%  vmstat.system.cs
    164231           +11.8%     183678        aim9.disk_src.ops_per_sec
      1309 ± 15%   +2643.5%      35915 ± 23%  aim9.time.involuntary_context_switches
     91.00            +5.5%      96.00        aim9.time.percent_of_cpu_this_job_got
    212.54            +3.5%     220.06        aim9.time.system_time
     62.58           +10.2%      68.94        aim9.time.user_time
     21685            -7.1%      20144        proc-vmstat.nr_slab_reclaimable
   6611541           -88.6%     750673 ±  7%  proc-vmstat.numa_hit
   6561447           -89.3%     700947 ±  7%  proc-vmstat.numa_local
      5747            +3.7%       5960        proc-vmstat.pgactivate
  26113963           -93.7%    1648373 ± 17%  proc-vmstat.pgalloc_normal
  26042963           -93.7%    1628178 ± 18%  proc-vmstat.pgfree
      2.07            -1.2%       2.04        perf-stat.i.MPKI
 6.738e+08            +3.0%   6.94e+08        perf-stat.i.branch-instructions
      2.94            -0.2        2.70        perf-stat.i.branch-miss-rate%
  20408670            -5.1%   19363031        perf-stat.i.branch-misses
     15.11            +2.7       17.77        perf-stat.i.cache-miss-rate%
  46824224           -14.7%   39962840        perf-stat.i.cache-references
      1419 ±  2%     +14.4%       1623 ±  5%  perf-stat.i.context-switches
      1.88            -1.3%       1.85        perf-stat.i.cpi
 9.453e+08            +2.2%  9.659e+08        perf-stat.i.dTLB-loads
      0.22 ±  5%      +0.0        0.25 ±  3%  perf-stat.i.dTLB-store-miss-rate%
   8.8e+08            -6.8%  8.205e+08        perf-stat.i.dTLB-stores
   1536484            +7.9%    1657233        perf-stat.i.iTLB-load-misses
      2279            -6.0%       2142        perf-stat.i.instructions-per-iTLB-miss
      0.54            +1.3%       0.54        perf-stat.i.ipc
    786.95            +7.1%     843.12        perf-stat.i.metric.K/sec
     47.07            +1.1       48.17        perf-stat.i.node-load-miss-rate%
     87561 ±  4%     +17.2%     102647 ±  6%  perf-stat.i.node-load-misses
      2.01            -1.2%       1.99        perf-stat.overall.MPKI
      3.03            -0.2        2.79        perf-stat.overall.branch-miss-rate%
     15.07            +2.6       17.67        perf-stat.overall.cache-miss-rate%
      1.84            -1.2%       1.82        perf-stat.overall.cpi
      0.22 ±  5%      +0.0        0.24 ±  3%  perf-stat.overall.dTLB-store-miss-rate%
      2283            -6.1%       2144        perf-stat.overall.instructions-per-iTLB-miss
      0.54            +1.2%       0.55        perf-stat.overall.ipc
     44.15            +1.8       45.93        perf-stat.overall.node-load-miss-rate%
 6.715e+08            +3.0%  6.917e+08        perf-stat.ps.branch-instructions
  20340341            -5.1%   19299968        perf-stat.ps.branch-misses
  46667379           -14.7%   39829580        perf-stat.ps.cache-references
      1414 ±  2%     +14.4%       1618 ±  5%  perf-stat.ps.context-switches
 9.421e+08            +2.2%  9.627e+08        perf-stat.ps.dTLB-loads
 8.771e+08            -6.8%  8.178e+08        perf-stat.ps.dTLB-stores
   1531338            +7.9%    1651678        perf-stat.ps.iTLB-load-misses
     87275 ±  4%     +17.3%     102341 ±  6%  perf-stat.ps.node-load-misses
      5.62 ± 13%      -1.9        3.69 ± 12%  perf-profile.calltrace.cycles-pp.shmem_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open
      7.87 ± 13%      -1.9        5.95 ± 11%  perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2
      8.47 ± 13%      -1.9        6.59 ± 10%  perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat
      2.97 ± 12%      -1.8        1.16 ± 13%  perf-profile.calltrace.cycles-pp.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups.path_openat
      0.00            +1.0        0.98 ± 13%  perf-profile.calltrace.cycles-pp.mas_alloc_cyclic.mtree_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open
      0.00            +1.0        1.00 ± 40%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn
      0.00            +1.0        1.03 ± 40%  perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread
      0.00            +1.1        1.06 ± 40%  perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
      0.00            +1.1        1.06 ± 40%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.00            +1.1        1.10 ± 39%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.00            +1.1        1.10 ± 14%  perf-profile.calltrace.cycles-pp.mtree_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups
      0.00            +1.2        1.20 ± 13%  perf-profile.calltrace.cycles-pp.mas_erase.mtree_erase.simple_offset_remove.shmem_unlink.vfs_unlink
      0.00            +1.3        1.27 ± 38%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
      0.00            +1.3        1.27 ± 38%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
      0.00            +1.3        1.27 ± 38%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
      0.00            +1.4        1.35 ± 12%  perf-profile.calltrace.cycles-pp.mtree_erase.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat
     15.22 ±  8%      -2.8       12.40 ±  8%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
     14.50 ±  8%      -2.8       11.72 ±  8%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      4.73 ± 13%      -2.8        1.97 ± 15%  perf-profile.children.cycles-pp.irq_exit_rcu
      3.50 ± 12%      -2.1        1.41 ± 12%  perf-profile.children.cycles-pp.kmem_cache_alloc_lru
      5.63 ± 13%      -1.9        3.70 ± 12%  perf-profile.children.cycles-pp.shmem_mknod
      7.88 ± 13%      -1.9        5.97 ± 11%  perf-profile.children.cycles-pp.lookup_open
      8.49 ± 13%      -1.9        6.62 ± 10%  perf-profile.children.cycles-pp.open_last_lookups
      2.97 ± 12%      -1.8        1.16 ± 13%  perf-profile.children.cycles-pp.simple_offset_add
      2.90 ± 22%      -1.8        1.15 ± 41%  perf-profile.children.cycles-pp.rcu_do_batch
      4.47 ± 14%      -1.7        2.76 ± 24%  perf-profile.children.cycles-pp.__do_softirq
      1.85 ± 15%      -1.7        0.14 ± 28%  perf-profile.children.cycles-pp.___slab_alloc
      3.00 ± 22%      -1.7        1.34 ± 38%  perf-profile.children.cycles-pp.rcu_core
      1.66 ± 15%      -1.6        0.05 ± 68%  perf-profile.children.cycles-pp.allocate_slab
      0.92 ± 18%      -0.6        0.31 ± 19%  perf-profile.children.cycles-pp.__call_rcu_common
      0.88 ± 27%      -0.6        0.31 ± 43%  perf-profile.children.cycles-pp.__slab_free
      0.28 ± 15%      -0.2        0.12 ± 25%  perf-profile.children.cycles-pp.xas_load
      0.20 ± 18%      -0.1        0.08 ± 30%  perf-profile.children.cycles-pp.rcu_segcblist_enqueue
      0.12 ± 30%      -0.1        0.05 ± 65%  perf-profile.children.cycles-pp.rcu_nocb_try_bypass
      0.00            +0.1        0.10 ± 27%  perf-profile.children.cycles-pp.mas_wr_end_piv
      0.00            +0.2        0.17 ± 22%  perf-profile.children.cycles-pp.mas_leaf_max_gap
      0.00            +0.2        0.18 ± 24%  perf-profile.children.cycles-pp.mtree_range_walk
      0.00            +0.2        0.24 ± 22%  perf-profile.children.cycles-pp.mas_anode_descend
      0.00            +0.3        0.29 ± 16%  perf-profile.children.cycles-pp.mas_wr_walk
      0.00            +0.3        0.31 ± 23%  perf-profile.children.cycles-pp.mas_update_gap
      0.00            +0.3        0.32 ± 17%  perf-profile.children.cycles-pp.mas_wr_append
      0.00            +0.4        0.37 ± 15%  perf-profile.children.cycles-pp.mas_empty_area
      0.00            +0.5        0.47 ± 18%  perf-profile.children.cycles-pp.mas_wr_node_store
      0.00            +1.0        0.99 ± 13%  perf-profile.children.cycles-pp.mas_alloc_cyclic
      0.05 ± 82%      +1.0        1.10 ± 39%  perf-profile.children.cycles-pp.smpboot_thread_fn
      0.01 ±264%      +1.0        1.06 ± 40%  perf-profile.children.cycles-pp.run_ksoftirqd
      0.22 ± 36%      +1.1        1.28 ± 38%  perf-profile.children.cycles-pp.ret_from_fork
      0.22 ± 36%      +1.1        1.28 ± 38%  perf-profile.children.cycles-pp.ret_from_fork_asm
      0.21 ± 38%      +1.1        1.27 ± 38%  perf-profile.children.cycles-pp.kthread
      0.00            +1.1        1.11 ± 14%  perf-profile.children.cycles-pp.mtree_alloc_cyclic
      0.00            +1.2        1.21 ± 14%  perf-profile.children.cycles-pp.mas_erase
      0.00            +1.4        1.35 ± 12%  perf-profile.children.cycles-pp.mtree_erase
      0.87 ± 27%      -0.6        0.31 ± 42%  perf-profile.self.cycles-pp.__slab_free
      0.53 ± 19%      -0.4        0.18 ± 23%  perf-profile.self.cycles-pp.__call_rcu_common
      0.57 ± 10%      -0.3        0.26 ± 21%  perf-profile.self.cycles-pp.kmem_cache_alloc_lru
      0.89 ± 14%      -0.3        0.59 ± 15%  perf-profile.self.cycles-pp.kmem_cache_free
      0.19 ± 21%      -0.1        0.06 ± 65%  perf-profile.self.cycles-pp.rcu_segcblist_enqueue
      0.10 ± 20%      -0.1        0.04 ± 81%  perf-profile.self.cycles-pp.xas_load
      0.08 ± 19%      -0.0        0.04 ± 61%  perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.00            +0.1        0.09 ± 30%  perf-profile.self.cycles-pp.mtree_erase
      0.00            +0.1        0.10 ± 26%  perf-profile.self.cycles-pp.mtree_alloc_cyclic
      0.00            +0.1        0.10 ± 27%  perf-profile.self.cycles-pp.mas_wr_end_piv
      0.00            +0.1        0.12 ± 38%  perf-profile.self.cycles-pp.mas_empty_area
      0.00            +0.1        0.14 ± 38%  perf-profile.self.cycles-pp.mas_update_gap
      0.00            +0.1        0.14 ± 20%  perf-profile.self.cycles-pp.mas_wr_append
      0.00            +0.2        0.16 ± 23%  perf-profile.self.cycles-pp.mas_leaf_max_gap
      0.00            +0.2        0.18 ± 24%  perf-profile.self.cycles-pp.mtree_range_walk
      0.00            +0.2        0.18 ± 29%  perf-profile.self.cycles-pp.mas_alloc_cyclic
      0.00            +0.2        0.22 ± 32%  perf-profile.self.cycles-pp.mas_erase
      0.00            +0.2        0.24 ± 22%  perf-profile.self.cycles-pp.mas_anode_descend
      0.00            +0.3        0.27 ± 16%  perf-profile.self.cycles-pp.mas_wr_walk
      0.00            +0.3        0.34 ± 20%  perf-profile.self.cycles-pp.mas_wr_node_store




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux