Re: [cel:simple-offset-maple] [libfs] a616bc6667: aim9.disk_src.ops_per_sec 11.8% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Including Liam ...

On Mon, Feb 19, 2024 at 01:44:05PM +0800, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a 11.8% improvement of aim9.disk_src.ops_per_sec on:
> 
> 
> commit: a616bc666748063733c62e15ea417a90772a40e0 ("libfs: Convert simple directory offsets to use a Maple Tree")
> git://git.kernel.org/cgit/linux/kernel/git/cel/linux simple-offset-maple
> 
> testcase: aim9
> test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
> parameters:
> 
> 	testtime: 300s
> 	test: disk_src
> 	cpufreq_governor: performance
> 
> 
> 
> 
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240219/202402191308.8e7ee8c7-oliver.sang@xxxxxxxxx
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/disk_src/aim9/300s
> 
> commit: 
>   f3f24869a1 ("test_maple_tree: testing the cyclic allocation")
>   a616bc6667 ("libfs: Convert simple directory offsets to use a Maple Tree")
> 
> f3f24869a1d7cde1 a616bc666748063733c62e15ea4 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>       0.34 ±  4%      -0.1        0.20 ±  4%  mpstat.cpu.all.soft%
>       0.00 ± 28%     +58.3%       0.00 ± 17%  perf-sched.sch_delay.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm
>       1464 ±  2%     +14.0%       1668 ±  4%  vmstat.system.cs
>     164231           +11.8%     183678        aim9.disk_src.ops_per_sec
>       1309 ± 15%   +2643.5%      35915 ± 23%  aim9.time.involuntary_context_switches
>      91.00            +5.5%      96.00        aim9.time.percent_of_cpu_this_job_got
>     212.54            +3.5%     220.06        aim9.time.system_time
>      62.58           +10.2%      68.94        aim9.time.user_time
>      21685            -7.1%      20144        proc-vmstat.nr_slab_reclaimable
>    6611541           -88.6%     750673 ±  7%  proc-vmstat.numa_hit
>    6561447           -89.3%     700947 ±  7%  proc-vmstat.numa_local
>       5747            +3.7%       5960        proc-vmstat.pgactivate
>   26113963           -93.7%    1648373 ± 17%  proc-vmstat.pgalloc_normal
>   26042963           -93.7%    1628178 ± 18%  proc-vmstat.pgfree
>       2.07            -1.2%       2.04        perf-stat.i.MPKI
>  6.738e+08            +3.0%   6.94e+08        perf-stat.i.branch-instructions
>       2.94            -0.2        2.70        perf-stat.i.branch-miss-rate%
>   20408670            -5.1%   19363031        perf-stat.i.branch-misses
>      15.11            +2.7       17.77        perf-stat.i.cache-miss-rate%
>   46824224           -14.7%   39962840        perf-stat.i.cache-references
>       1419 ±  2%     +14.4%       1623 ±  5%  perf-stat.i.context-switches
>       1.88            -1.3%       1.85        perf-stat.i.cpi
>  9.453e+08            +2.2%  9.659e+08        perf-stat.i.dTLB-loads
>       0.22 ±  5%      +0.0        0.25 ±  3%  perf-stat.i.dTLB-store-miss-rate%
>    8.8e+08            -6.8%  8.205e+08        perf-stat.i.dTLB-stores
>    1536484            +7.9%    1657233        perf-stat.i.iTLB-load-misses
>       2279            -6.0%       2142        perf-stat.i.instructions-per-iTLB-miss
>       0.54            +1.3%       0.54        perf-stat.i.ipc
>     786.95            +7.1%     843.12        perf-stat.i.metric.K/sec
>      47.07            +1.1       48.17        perf-stat.i.node-load-miss-rate%
>      87561 ±  4%     +17.2%     102647 ±  6%  perf-stat.i.node-load-misses
>       2.01            -1.2%       1.99        perf-stat.overall.MPKI
>       3.03            -0.2        2.79        perf-stat.overall.branch-miss-rate%
>      15.07            +2.6       17.67        perf-stat.overall.cache-miss-rate%
>       1.84            -1.2%       1.82        perf-stat.overall.cpi
>       0.22 ±  5%      +0.0        0.24 ±  3%  perf-stat.overall.dTLB-store-miss-rate%
>       2283            -6.1%       2144        perf-stat.overall.instructions-per-iTLB-miss
>       0.54            +1.2%       0.55        perf-stat.overall.ipc
>      44.15            +1.8       45.93        perf-stat.overall.node-load-miss-rate%
>  6.715e+08            +3.0%  6.917e+08        perf-stat.ps.branch-instructions
>   20340341            -5.1%   19299968        perf-stat.ps.branch-misses
>   46667379           -14.7%   39829580        perf-stat.ps.cache-references
>       1414 ±  2%     +14.4%       1618 ±  5%  perf-stat.ps.context-switches
>  9.421e+08            +2.2%  9.627e+08        perf-stat.ps.dTLB-loads
>  8.771e+08            -6.8%  8.178e+08        perf-stat.ps.dTLB-stores
>    1531338            +7.9%    1651678        perf-stat.ps.iTLB-load-misses
>      87275 ±  4%     +17.3%     102341 ±  6%  perf-stat.ps.node-load-misses
>       5.62 ± 13%      -1.9        3.69 ± 12%  perf-profile.calltrace.cycles-pp.shmem_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open
>       7.87 ± 13%      -1.9        5.95 ± 11%  perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2
>       8.47 ± 13%      -1.9        6.59 ± 10%  perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat
>       2.97 ± 12%      -1.8        1.16 ± 13%  perf-profile.calltrace.cycles-pp.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups.path_openat
>       0.00            +1.0        0.98 ± 13%  perf-profile.calltrace.cycles-pp.mas_alloc_cyclic.mtree_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open
>       0.00            +1.0        1.00 ± 40%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn
>       0.00            +1.0        1.03 ± 40%  perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread
>       0.00            +1.1        1.06 ± 40%  perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
>       0.00            +1.1        1.06 ± 40%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       0.00            +1.1        1.10 ± 39%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       0.00            +1.1        1.10 ± 14%  perf-profile.calltrace.cycles-pp.mtree_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups
>       0.00            +1.2        1.20 ± 13%  perf-profile.calltrace.cycles-pp.mas_erase.mtree_erase.simple_offset_remove.shmem_unlink.vfs_unlink
>       0.00            +1.3        1.27 ± 38%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
>       0.00            +1.3        1.27 ± 38%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
>       0.00            +1.3        1.27 ± 38%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
>       0.00            +1.4        1.35 ± 12%  perf-profile.calltrace.cycles-pp.mtree_erase.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat
>      15.22 ±  8%      -2.8       12.40 ±  8%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>      14.50 ±  8%      -2.8       11.72 ±  8%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
>       4.73 ± 13%      -2.8        1.97 ± 15%  perf-profile.children.cycles-pp.irq_exit_rcu
>       3.50 ± 12%      -2.1        1.41 ± 12%  perf-profile.children.cycles-pp.kmem_cache_alloc_lru
>       5.63 ± 13%      -1.9        3.70 ± 12%  perf-profile.children.cycles-pp.shmem_mknod
>       7.88 ± 13%      -1.9        5.97 ± 11%  perf-profile.children.cycles-pp.lookup_open
>       8.49 ± 13%      -1.9        6.62 ± 10%  perf-profile.children.cycles-pp.open_last_lookups
>       2.97 ± 12%      -1.8        1.16 ± 13%  perf-profile.children.cycles-pp.simple_offset_add
>       2.90 ± 22%      -1.8        1.15 ± 41%  perf-profile.children.cycles-pp.rcu_do_batch
>       4.47 ± 14%      -1.7        2.76 ± 24%  perf-profile.children.cycles-pp.__do_softirq
>       1.85 ± 15%      -1.7        0.14 ± 28%  perf-profile.children.cycles-pp.___slab_alloc
>       3.00 ± 22%      -1.7        1.34 ± 38%  perf-profile.children.cycles-pp.rcu_core
>       1.66 ± 15%      -1.6        0.05 ± 68%  perf-profile.children.cycles-pp.allocate_slab
>       0.92 ± 18%      -0.6        0.31 ± 19%  perf-profile.children.cycles-pp.__call_rcu_common
>       0.88 ± 27%      -0.6        0.31 ± 43%  perf-profile.children.cycles-pp.__slab_free
>       0.28 ± 15%      -0.2        0.12 ± 25%  perf-profile.children.cycles-pp.xas_load
>       0.20 ± 18%      -0.1        0.08 ± 30%  perf-profile.children.cycles-pp.rcu_segcblist_enqueue
>       0.12 ± 30%      -0.1        0.05 ± 65%  perf-profile.children.cycles-pp.rcu_nocb_try_bypass
>       0.00            +0.1        0.10 ± 27%  perf-profile.children.cycles-pp.mas_wr_end_piv
>       0.00            +0.2        0.17 ± 22%  perf-profile.children.cycles-pp.mas_leaf_max_gap
>       0.00            +0.2        0.18 ± 24%  perf-profile.children.cycles-pp.mtree_range_walk
>       0.00            +0.2        0.24 ± 22%  perf-profile.children.cycles-pp.mas_anode_descend
>       0.00            +0.3        0.29 ± 16%  perf-profile.children.cycles-pp.mas_wr_walk
>       0.00            +0.3        0.31 ± 23%  perf-profile.children.cycles-pp.mas_update_gap
>       0.00            +0.3        0.32 ± 17%  perf-profile.children.cycles-pp.mas_wr_append
>       0.00            +0.4        0.37 ± 15%  perf-profile.children.cycles-pp.mas_empty_area
>       0.00            +0.5        0.47 ± 18%  perf-profile.children.cycles-pp.mas_wr_node_store
>       0.00            +1.0        0.99 ± 13%  perf-profile.children.cycles-pp.mas_alloc_cyclic
>       0.05 ± 82%      +1.0        1.10 ± 39%  perf-profile.children.cycles-pp.smpboot_thread_fn
>       0.01 ±264%      +1.0        1.06 ± 40%  perf-profile.children.cycles-pp.run_ksoftirqd
>       0.22 ± 36%      +1.1        1.28 ± 38%  perf-profile.children.cycles-pp.ret_from_fork
>       0.22 ± 36%      +1.1        1.28 ± 38%  perf-profile.children.cycles-pp.ret_from_fork_asm
>       0.21 ± 38%      +1.1        1.27 ± 38%  perf-profile.children.cycles-pp.kthread
>       0.00            +1.1        1.11 ± 14%  perf-profile.children.cycles-pp.mtree_alloc_cyclic
>       0.00            +1.2        1.21 ± 14%  perf-profile.children.cycles-pp.mas_erase
>       0.00            +1.4        1.35 ± 12%  perf-profile.children.cycles-pp.mtree_erase
>       0.87 ± 27%      -0.6        0.31 ± 42%  perf-profile.self.cycles-pp.__slab_free
>       0.53 ± 19%      -0.4        0.18 ± 23%  perf-profile.self.cycles-pp.__call_rcu_common
>       0.57 ± 10%      -0.3        0.26 ± 21%  perf-profile.self.cycles-pp.kmem_cache_alloc_lru
>       0.89 ± 14%      -0.3        0.59 ± 15%  perf-profile.self.cycles-pp.kmem_cache_free
>       0.19 ± 21%      -0.1        0.06 ± 65%  perf-profile.self.cycles-pp.rcu_segcblist_enqueue
>       0.10 ± 20%      -0.1        0.04 ± 81%  perf-profile.self.cycles-pp.xas_load
>       0.08 ± 19%      -0.0        0.04 ± 61%  perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
>       0.00            +0.1        0.09 ± 30%  perf-profile.self.cycles-pp.mtree_erase
>       0.00            +0.1        0.10 ± 26%  perf-profile.self.cycles-pp.mtree_alloc_cyclic
>       0.00            +0.1        0.10 ± 27%  perf-profile.self.cycles-pp.mas_wr_end_piv
>       0.00            +0.1        0.12 ± 38%  perf-profile.self.cycles-pp.mas_empty_area
>       0.00            +0.1        0.14 ± 38%  perf-profile.self.cycles-pp.mas_update_gap
>       0.00            +0.1        0.14 ± 20%  perf-profile.self.cycles-pp.mas_wr_append
>       0.00            +0.2        0.16 ± 23%  perf-profile.self.cycles-pp.mas_leaf_max_gap
>       0.00            +0.2        0.18 ± 24%  perf-profile.self.cycles-pp.mtree_range_walk
>       0.00            +0.2        0.18 ± 29%  perf-profile.self.cycles-pp.mas_alloc_cyclic
>       0.00            +0.2        0.22 ± 32%  perf-profile.self.cycles-pp.mas_erase
>       0.00            +0.2        0.24 ± 22%  perf-profile.self.cycles-pp.mas_anode_descend
>       0.00            +0.3        0.27 ± 16%  perf-profile.self.cycles-pp.mas_wr_walk
>       0.00            +0.3        0.34 ± 20%  perf-profile.self.cycles-pp.mas_wr_node_store
> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
> 

-- 
Chuck Lever




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux