[linus:master] [maple_tree] 4249f13c11: aim9.page_test.ops_per_sec 3.5% improvement

kernel test robot <oliver.sang@xxxxxxxxx> · Tue, 9 Jan 2024 22:03:56 +0800

Hello,

kernel test robot noticed a 3.5% improvement of aim9.page_test.ops_per_sec on:

commit: 4249f13c11be8b8b7bf93204185e150c3bdc968d ("maple_tree: do not preallocate nodes for slot stores")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: aim9
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
parameters:

	testtime: 300s
	test: page_test
	cpufreq_governor: performance

Details are as below:
-------------------------------------------------------------------------------------------------->

The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240109/202401091651.a189376-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/page_test/aim9/300s

commit: 
  e2c27b803b ("mm/filemap: avoid buffered read/write race to read inconsistent data")
  4249f13c11 ("maple_tree: do not preallocate nodes for slot stores")

e2c27b803bb66474 4249f13c11be8b8b7bf93204185 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    336518            +3.5%     348367        aim9.page_test.ops_per_sec
  95019000            +3.5%   98364469        aim9.time.minor_page_faults
     25318            +2.3%      25903        proc-vmstat.nr_active_anon
     26605            +2.2%      27197        proc-vmstat.nr_shmem
     25318            +2.3%      25903        proc-vmstat.nr_zone_active_anon
 1.087e+08            +3.3%  1.122e+08        proc-vmstat.numa_hit
 1.085e+08            +3.4%  1.121e+08        proc-vmstat.numa_local
 1.079e+08            +3.5%  1.117e+08        proc-vmstat.pgalloc_normal
  95763046            +3.5%   99109694        proc-vmstat.pgfault
 1.078e+08            +3.5%  1.116e+08        proc-vmstat.pgfree
  56340620            +1.4%   57128415        perf-stat.i.cache-references
   3744535            -7.4%    3468589        perf-stat.i.iTLB-load-misses
    923.85            +8.2%     999.87        perf-stat.i.instructions-per-iTLB-miss
    318120            +3.5%     329244        perf-stat.i.minor-faults
    318120            +3.5%     329244        perf-stat.i.page-faults
     12.48            -0.2       12.32        perf-stat.overall.cache-miss-rate%
    911.69            +8.5%     988.95        perf-stat.overall.instructions-per-iTLB-miss
  56153225            +1.4%   56938073        perf-stat.ps.cache-references
   3731915            -7.4%    3456934        perf-stat.ps.iTLB-load-misses
    317046            +3.5%     328134        perf-stat.ps.minor-faults
    317046            +3.5%     328134        perf-stat.ps.page-faults
      1.54 ± 15%      -0.9        0.61 ± 35%  perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.56 ± 16%      -0.9        0.67 ± 18%  perf-profile.children.cycles-pp.mas_preallocate
      0.59 ± 18%      -0.5        0.06 ± 66%  perf-profile.children.cycles-pp.mas_destroy
      0.03 ± 84%      +0.1        0.13 ± 26%  perf-profile.children.cycles-pp.anon_vma_interval_tree_insert
      0.18 ± 27%      +0.2        0.42 ± 15%  perf-profile.children.cycles-pp.vma_adjust_trans_huge
      0.28 ± 12%      +0.3        0.57 ± 14%  perf-profile.children.cycles-pp.vma_complete
      0.20 ± 28%      -0.1        0.13 ± 24%  perf-profile.self.cycles-pp.security_mmap_addr
      0.16 ± 23%      -0.1        0.10 ± 17%  perf-profile.self.cycles-pp.__perf_sw_event
      0.17 ± 18%      +0.1        0.27 ± 30%  perf-profile.self.cycles-pp.get_vma_policy
      0.02 ±118%      +0.1        0.13 ± 26%  perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
      0.08 ± 25%      +0.2        0.24 ± 13%  perf-profile.self.cycles-pp.vma_complete
      0.18 ± 28%      +0.2        0.42 ± 15%  perf-profile.self.cycles-pp.vma_adjust_trans_huge

Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki