On Mon, Jul 17, 2023 at 10:41 PM kernel test robot <oliver.sang@xxxxxxxxx> wrote: > > > > Hello, > > kernel test robot noticed a -12.5% regression of hackbench.throughput on: > > > commit: a0fd217e6d6fbd23e91f8796787b621e7d576088 ("[PATCH] [RFC PATCH v2]mm/slub: Optimize slub memory usage") > url: https://github.com/intel-lab-lkp/linux/commits/Jay-Patel/mm-slub-Optimize-slub-memory-usage/20230628-180050 > base: git://git.kernel.org/cgit/linux/kernel/git/vbabka/slab.git for-next > patch link: https://lore.kernel.org/all/20230628095740.589893-1-jaypatel@xxxxxxxxxxxxx/ > patch subject: [PATCH] [RFC PATCH v2]mm/slub: Optimize slub memory usage > > testcase: hackbench > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory > parameters: > > nr_threads: 100% > iterations: 4 > mode: process > ipc: socket > cpufreq_governor: performance > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > | Closes: https://lore.kernel.org/oe-lkp/202307172140.3b34825a-oliver.sang@xxxxxxxxx > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > To reproduce: > > git clone https://github.com/intel/lkp-tests.git > cd lkp-tests > sudo bin/lkp install job.yaml # job file is attached in this email > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run > sudo bin/lkp run generated-yaml-file > > # if come across any failure that blocks the test, > # please remove ~/.lkp and /lkp dir to run from a clean state. > > ========================================================================================= > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase: > gcc-12/performance/socket/4/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp2/hackbench > > commit: > 7bc162d5cc ("Merge branches 'slab/for-6.5/prandom', 'slab/for-6.5/slab_no_merge' and 'slab/for-6.5/slab-deprecate' into slab/for-next") > a0fd217e6d ("mm/slub: Optimize slub memory usage") > > 7bc162d5cc4de5c3 a0fd217e6d6fbd23e91f8796787 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 222503 ą 86% +108.7% 464342 ą 58% numa-meminfo.node1.Active > 222459 ą 86% +108.7% 464294 ą 58% numa-meminfo.node1.Active(anon) > 55573 ą 85% +108.0% 115619 ą 58% numa-vmstat.node1.nr_active_anon > 55573 ą 85% +108.0% 115618 ą 58% numa-vmstat.node1.nr_zone_active_anon I'm quite baffled while reading this. How did changing slab order calculation double the number of active anon pages? I doubt two experiments were performed on the same settings. > 1377834 ą 2% -10.7% 1230013 sched_debug.cpu.nr_switches.avg > 1218144 ą 2% -13.3% 1055659 ą 2% sched_debug.cpu.nr_switches.min > 3047631 ą 2% -13.2% 2646560 vmstat.system.cs > 561797 -13.8% 484137 vmstat.system.in > 280976 ą 66% +122.6% 625459 ą 52% meminfo.Active > 280881 ą 66% +122.6% 625365 ą 52% meminfo.Active(anon) > 743351 ą 4% -9.7% 671534 ą 6% meminfo.AnonPages > 1.36 -0.1 1.21 mpstat.cpu.all.irq% > 0.04 ą 4% -0.0 0.03 ą 4% mpstat.cpu.all.soft% > 5.38 -0.8 4.58 mpstat.cpu.all.usr% > 0.26 -11.9% 0.23 turbostat.IPC > 160.93 -19.3 141.61 turbostat.PKG_% > 60.48 -8.9% 55.10 turbostat.RAMWatt > 70049 ą 68% +124.5% 157279 ą 52% proc-vmstat.nr_active_anon > 185963 ą 4% -9.8% 167802 ą 6% proc-vmstat.nr_anon_pages > 37302 -1.2% 36837 proc-vmstat.nr_slab_reclaimable > 70049 ą 68% +124.5% 157279 ą 52% proc-vmstat.nr_zone_active_anon > 1101451 +12.0% 1233638 proc-vmstat.unevictable_pgs_scanned > 477310 -12.5% 417480 hackbench.throughput > 464064 -12.0% 408333 hackbench.throughput_avg > 477310 -12.5% 417480 hackbench.throughput_best > 435294 -9.5% 394098 hackbench.throughput_worst > 131.28 +13.4% 148.89 hackbench.time.elapsed_time > 131.28 +13.4% 148.89 hackbench.time.elapsed_time.max > 90404617 -5.2% 85662614 ą 2% hackbench.time.involuntary_context_switches > 15342 +15.0% 17642 hackbench.time.system_time > 866.32 -3.2% 838.32 hackbench.time.user_time > 4.581e+10 -11.2% 4.069e+10 perf-stat.i.branch-instructions > 0.45 +0.1 0.56 perf-stat.i.branch-miss-rate% > 2.024e+08 +11.8% 2.263e+08 perf-stat.i.branch-misses > 21.49 -1.1 20.42 perf-stat.i.cache-miss-rate% > 4.202e+08 -16.6% 3.505e+08 perf-stat.i.cache-misses > 1.935e+09 -11.5% 1.711e+09 perf-stat.i.cache-references > 3115707 ą 2% -13.9% 2681887 perf-stat.i.context-switches > 1.31 +13.2% 1.48 perf-stat.i.cpi > 375155 ą 3% -16.3% 314001 ą 2% perf-stat.i.cpu-migrations > 6.727e+10 -11.2% 5.972e+10 perf-stat.i.dTLB-loads > 4.169e+10 -12.2% 3.661e+10 perf-stat.i.dTLB-stores > 2.465e+11 -11.4% 2.185e+11 perf-stat.i.instructions > 0.77 -11.8% 0.68 perf-stat.i.ipc > 818.18 ą 5% +61.8% 1323 ą 2% perf-stat.i.metric.K/sec > 1225 -11.6% 1083 perf-stat.i.metric.M/sec > 11341 ą 4% -12.6% 9916 ą 4% perf-stat.i.minor-faults > 1.27e+08 -13.2% 1.102e+08 perf-stat.i.node-load-misses > 3376198 -15.4% 2855906 perf-stat.i.node-loads > 72756698 -22.9% 56082330 perf-stat.i.node-store-misses > 4118986 ą 2% -19.3% 3322276 perf-stat.i.node-stores > 11432 ą 3% -12.6% 9991 ą 4% perf-stat.i.page-faults > 0.44 +0.1 0.56 perf-stat.overall.branch-miss-rate% > 21.76 -1.3 20.49 perf-stat.overall.cache-miss-rate% > 1.29 +13.5% 1.47 perf-stat.overall.cpi > 755.39 +21.1% 914.82 perf-stat.overall.cycles-between-cache-misses > 0.77 -11.9% 0.68 perf-stat.overall.ipc > 4.546e+10 -11.0% 4.046e+10 perf-stat.ps.branch-instructions > 2.006e+08 +12.0% 2.246e+08 perf-stat.ps.branch-misses > 4.183e+08 -16.8% 3.48e+08 perf-stat.ps.cache-misses > 1.923e+09 -11.7% 1.699e+09 perf-stat.ps.cache-references > 3073921 ą 2% -13.9% 2647497 perf-stat.ps.context-switches > 367849 ą 3% -16.1% 308496 ą 2% perf-stat.ps.cpu-migrations > 6.683e+10 -11.2% 5.938e+10 perf-stat.ps.dTLB-loads > 4.144e+10 -12.2% 3.639e+10 perf-stat.ps.dTLB-stores > 2.447e+11 -11.2% 2.172e+11 perf-stat.ps.instructions > 10654 ą 4% -11.5% 9428 ą 4% perf-stat.ps.minor-faults > 1.266e+08 -13.5% 1.095e+08 perf-stat.ps.node-load-misses > 3361116 -15.6% 2836863 perf-stat.ps.node-loads > 72294146 -23.1% 55573600 perf-stat.ps.node-store-misses > 4043240 ą 2% -19.4% 3258771 perf-stat.ps.node-stores > 10734 ą 4% -11.6% 9494 ą 4% perf-stat.ps.page-faults <...> > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > > -- > 0-DAY CI Kernel Test Service > https://github.com/intel/lkp-tests/wiki > >