Hello, kernel test robot noticed a -8.2% improvement of phoronix-test-suite.osbench.LaunchPrograms.us_per_event on: commit: 9d32938c115580bfff128a926d704199d2f33ba3 ("[PATCH v3 2/2] kernel/fork: group allocation/free of per-cpu counters for mm struct") url: https://github.com/intel-lab-lkp/linux/commits/Mateusz-Guzik/pcpcntr-add-group-allocation-free/20230823-130803 base: https://git.kernel.org/cgit/linux/kernel/git/dennis/percpu.git for-next patch link: https://lore.kernel.org/all/20230823050609.2228718-3-mjguzik@xxxxxxxxx/ patch subject: [PATCH v3 2/2] kernel/fork: group allocation/free of per-cpu counters for mm struct testcase: phoronix-test-suite test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory parameters: test: osbench-1.0.2 option_a: Launch Programs cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20230906/202309061504.7e645826-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Launch Programs/debian-x86_64-phoronix/lkp-csl-2sp7/osbench-1.0.2/phoronix-test-suite commit: 1db50472c8 ("pcpcntr: add group allocation/free") 9d32938c11 ("kernel/fork: group allocation/free of per-cpu counters for mm struct") 1db50472c8bc1d34 9d32938c115580bfff128a926d7 ---------------- --------------------------- %stddev %change %stddev \ | \ 3.00 +33.3% 4.00 vmstat.procs.r 14111 +5.7% 14918 vmstat.system.cs 2114 +1.1% 2136 turbostat.Bzy_MHz 1.67 +0.2 1.83 turbostat.C1E% 121.98 +5.1% 128.24 turbostat.PkgWatt 98.05 -8.2% 90.02 phoronix-test-suite.osbench.LaunchPrograms.us_per_event 16246 ± 4% +6.1% 17243 phoronix-test-suite.time.involuntary_context_switches 9791476 +9.2% 10689455 phoronix-test-suite.time.minor_page_faults 311.33 +9.3% 340.33 phoronix-test-suite.time.percent_of_cpu_this_job_got 83.40 ± 2% +9.2% 91.07 ± 2% phoronix-test-suite.time.system_time 151333 +8.6% 164355 phoronix-test-suite.time.voluntary_context_switches 3225 -5.5% 3046 ± 5% proc-vmstat.nr_page_table_pages 9150454 +8.0% 9884178 proc-vmstat.numa_hit 9088660 +8.7% 9882518 proc-vmstat.numa_local 9971116 +8.3% 10802925 proc-vmstat.pgalloc_normal 10202032 +8.8% 11099649 proc-vmstat.pgfault 9845338 +8.4% 10676360 proc-vmstat.pgfree 207049 +10.3% 228380 ± 8% proc-vmstat.pgreuse 1.947e+09 +5.0% 2.045e+09 perf-stat.i.branch-instructions 52304206 +4.4% 54610501 perf-stat.i.branch-misses 9.06 ± 2% +0.5 9.52 perf-stat.i.cache-miss-rate% 19663522 ± 3% +10.0% 21634645 perf-stat.i.cache-misses 1.658e+08 +3.6% 1.717e+08 perf-stat.i.cache-references 14769 +6.2% 15691 perf-stat.i.context-switches 1.338e+10 +6.2% 1.42e+10 perf-stat.i.cpu-cycles 3112873 ± 3% -12.5% 2724690 ± 3% perf-stat.i.dTLB-load-misses 2.396e+09 +5.5% 2.528e+09 perf-stat.i.dTLB-loads 0.11 ± 4% -0.0 0.10 ± 2% perf-stat.i.dTLB-store-miss-rate% 1003394 ± 6% -14.0% 862768 ± 5% perf-stat.i.dTLB-store-misses 1.25e+09 +6.0% 1.325e+09 perf-stat.i.dTLB-stores 71.16 -1.3 69.88 perf-stat.i.iTLB-load-miss-rate% 1872082 +8.2% 2025999 perf-stat.i.iTLB-loads 9.606e+09 +5.4% 1.012e+10 perf-stat.i.instructions 23.37 ± 5% +30.6% 30.53 ± 4% perf-stat.i.major-faults 0.14 +6.2% 0.15 perf-stat.i.metric.GHz 59.39 +5.4% 62.61 perf-stat.i.metric.M/sec 249517 +10.0% 274572 perf-stat.i.minor-faults 5081285 +6.0% 5385686 ± 4% perf-stat.i.node-load-misses 565117 ± 3% +8.1% 610682 ± 3% perf-stat.i.node-loads 249541 +10.0% 274602 perf-stat.i.page-faults 17.27 -1.7% 16.98 perf-stat.overall.MPKI 11.85 ± 2% +0.7 12.59 perf-stat.overall.cache-miss-rate% 0.13 ± 2% -0.0 0.11 ± 2% perf-stat.overall.dTLB-load-miss-rate% 0.08 ± 7% -0.0 0.07 ± 4% perf-stat.overall.dTLB-store-miss-rate% 67.26 -1.1 66.12 perf-stat.overall.iTLB-load-miss-rate% 1.895e+09 +5.0% 1.99e+09 perf-stat.ps.branch-instructions 50921385 +4.4% 53146828 perf-stat.ps.branch-misses 19140130 ± 3% +10.0% 21047707 perf-stat.ps.cache-misses 1.615e+08 +3.5% 1.672e+08 perf-stat.ps.cache-references 14376 +6.2% 15266 perf-stat.ps.context-switches 1.303e+10 +6.1% 1.383e+10 perf-stat.ps.cpu-cycles 3033019 ± 3% -12.5% 2654269 ± 3% perf-stat.ps.dTLB-load-misses 2.332e+09 +5.5% 2.46e+09 perf-stat.ps.dTLB-loads 976773 ± 6% -14.1% 839517 ± 5% perf-stat.ps.dTLB-store-misses 1.217e+09 +6.0% 1.289e+09 perf-stat.ps.dTLB-stores 1822198 +8.2% 1971115 perf-stat.ps.iTLB-loads 9.349e+09 +5.3% 9.846e+09 perf-stat.ps.instructions 22.75 ± 5% +30.5% 29.69 ± 4% perf-stat.ps.major-faults 242831 +10.0% 267074 perf-stat.ps.minor-faults 4945101 +5.9% 5238638 ± 4% perf-stat.ps.node-load-misses 550029 ± 3% +8.0% 594116 ± 3% perf-stat.ps.node-loads 242854 +10.0% 267104 perf-stat.ps.page-faults 3.719e+11 +4.4% 3.883e+11 perf-stat.total.instructions Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki