Re: [PATCH v3 2/2] kernel/fork: group allocation/free of per-cpu counters for mm struct

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a -8.2% improvement of phoronix-test-suite.osbench.LaunchPrograms.us_per_event on:


commit: 9d32938c115580bfff128a926d704199d2f33ba3 ("[PATCH v3 2/2] kernel/fork: group allocation/free of per-cpu counters for mm struct")
url: https://github.com/intel-lab-lkp/linux/commits/Mateusz-Guzik/pcpcntr-add-group-allocation-free/20230823-130803
base: https://git.kernel.org/cgit/linux/kernel/git/dennis/percpu.git for-next
patch link: https://lore.kernel.org/all/20230823050609.2228718-3-mjguzik@xxxxxxxxx/
patch subject: [PATCH v3 2/2] kernel/fork: group allocation/free of per-cpu counters for mm struct

testcase: phoronix-test-suite
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
parameters:

	test: osbench-1.0.2
	option_a: Launch Programs
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230906/202309061504.7e645826-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/Launch Programs/debian-x86_64-phoronix/lkp-csl-2sp7/osbench-1.0.2/phoronix-test-suite

commit: 
  1db50472c8 ("pcpcntr: add group allocation/free")
  9d32938c11 ("kernel/fork: group allocation/free of per-cpu counters for mm struct")

1db50472c8bc1d34 9d32938c115580bfff128a926d7 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      3.00           +33.3%       4.00        vmstat.procs.r
     14111            +5.7%      14918        vmstat.system.cs
      2114            +1.1%       2136        turbostat.Bzy_MHz
      1.67            +0.2        1.83        turbostat.C1E%
    121.98            +5.1%     128.24        turbostat.PkgWatt
     98.05            -8.2%      90.02        phoronix-test-suite.osbench.LaunchPrograms.us_per_event
     16246 ±  4%      +6.1%      17243        phoronix-test-suite.time.involuntary_context_switches
   9791476            +9.2%   10689455        phoronix-test-suite.time.minor_page_faults
    311.33            +9.3%     340.33        phoronix-test-suite.time.percent_of_cpu_this_job_got
     83.40 ±  2%      +9.2%      91.07 ±  2%  phoronix-test-suite.time.system_time
    151333            +8.6%     164355        phoronix-test-suite.time.voluntary_context_switches
      3225            -5.5%       3046 ±  5%  proc-vmstat.nr_page_table_pages
   9150454            +8.0%    9884178        proc-vmstat.numa_hit
   9088660            +8.7%    9882518        proc-vmstat.numa_local
   9971116            +8.3%   10802925        proc-vmstat.pgalloc_normal
  10202032            +8.8%   11099649        proc-vmstat.pgfault
   9845338            +8.4%   10676360        proc-vmstat.pgfree
    207049           +10.3%     228380 ±  8%  proc-vmstat.pgreuse
 1.947e+09            +5.0%  2.045e+09        perf-stat.i.branch-instructions
  52304206            +4.4%   54610501        perf-stat.i.branch-misses
      9.06 ±  2%      +0.5        9.52        perf-stat.i.cache-miss-rate%
  19663522 ±  3%     +10.0%   21634645        perf-stat.i.cache-misses
 1.658e+08            +3.6%  1.717e+08        perf-stat.i.cache-references
     14769            +6.2%      15691        perf-stat.i.context-switches
 1.338e+10            +6.2%   1.42e+10        perf-stat.i.cpu-cycles
   3112873 ±  3%     -12.5%    2724690 ±  3%  perf-stat.i.dTLB-load-misses
 2.396e+09            +5.5%  2.528e+09        perf-stat.i.dTLB-loads
      0.11 ±  4%      -0.0        0.10 ±  2%  perf-stat.i.dTLB-store-miss-rate%
   1003394 ±  6%     -14.0%     862768 ±  5%  perf-stat.i.dTLB-store-misses
  1.25e+09            +6.0%  1.325e+09        perf-stat.i.dTLB-stores
     71.16            -1.3       69.88        perf-stat.i.iTLB-load-miss-rate%
   1872082            +8.2%    2025999        perf-stat.i.iTLB-loads
 9.606e+09            +5.4%  1.012e+10        perf-stat.i.instructions
     23.37 ±  5%     +30.6%      30.53 ±  4%  perf-stat.i.major-faults
      0.14            +6.2%       0.15        perf-stat.i.metric.GHz
     59.39            +5.4%      62.61        perf-stat.i.metric.M/sec
    249517           +10.0%     274572        perf-stat.i.minor-faults
   5081285            +6.0%    5385686 ±  4%  perf-stat.i.node-load-misses
    565117 ±  3%      +8.1%     610682 ±  3%  perf-stat.i.node-loads
    249541           +10.0%     274602        perf-stat.i.page-faults
     17.27            -1.7%      16.98        perf-stat.overall.MPKI
     11.85 ±  2%      +0.7       12.59        perf-stat.overall.cache-miss-rate%
      0.13 ±  2%      -0.0        0.11 ±  2%  perf-stat.overall.dTLB-load-miss-rate%
      0.08 ±  7%      -0.0        0.07 ±  4%  perf-stat.overall.dTLB-store-miss-rate%
     67.26            -1.1       66.12        perf-stat.overall.iTLB-load-miss-rate%
 1.895e+09            +5.0%   1.99e+09        perf-stat.ps.branch-instructions
  50921385            +4.4%   53146828        perf-stat.ps.branch-misses
  19140130 ±  3%     +10.0%   21047707        perf-stat.ps.cache-misses
 1.615e+08            +3.5%  1.672e+08        perf-stat.ps.cache-references
     14376            +6.2%      15266        perf-stat.ps.context-switches
 1.303e+10            +6.1%  1.383e+10        perf-stat.ps.cpu-cycles
   3033019 ±  3%     -12.5%    2654269 ±  3%  perf-stat.ps.dTLB-load-misses
 2.332e+09            +5.5%   2.46e+09        perf-stat.ps.dTLB-loads
    976773 ±  6%     -14.1%     839517 ±  5%  perf-stat.ps.dTLB-store-misses
 1.217e+09            +6.0%  1.289e+09        perf-stat.ps.dTLB-stores
   1822198            +8.2%    1971115        perf-stat.ps.iTLB-loads
 9.349e+09            +5.3%  9.846e+09        perf-stat.ps.instructions
     22.75 ±  5%     +30.5%      29.69 ±  4%  perf-stat.ps.major-faults
    242831           +10.0%     267074        perf-stat.ps.minor-faults
   4945101            +5.9%    5238638 ±  4%  perf-stat.ps.node-load-misses
    550029 ±  3%      +8.0%     594116 ±  3%  perf-stat.ps.node-loads
    242854           +10.0%     267104        perf-stat.ps.page-faults
 3.719e+11            +4.4%  3.883e+11        perf-stat.total.instructions




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux