Hello, kernel test robot noticed a -33.6% improvement of autonuma-benchmark.numa02.seconds on: commit: af46f3c9ca2d16485912f8b9c896ef48bbfe1388 ("[RFC PATCH V1 5/6] sched/numa: Allow recently accessed VMAs to be scanned") url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007 base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805 patch link: https://lore.kernel.org/all/109ca1ea59b9dd6f2daf7b7fbc74e83ae074fbdf.1693287931.git.raghavendra.kt@xxxxxxx/ patch subject: [RFC PATCH V1 5/6] sched/numa: Allow recently accessed VMAs to be scanned testcase: autonuma-benchmark test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory parameters: iterations: 4x test: numa01_THREAD_ALLOC cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20230910/202309102311.84b42068-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase: gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp6/numa01_THREAD_ALLOC/autonuma-benchmark commit: 167773d1dd ("sched/numa: Increase tasks' access history") af46f3c9ca ("sched/numa: Allow recently accessed VMAs to be scanned") 167773d1ddb5ffdd af46f3c9ca2d16485912f8b9c89 ---------------- --------------------------- %stddev %change %stddev \ | \ 2.534e+10 ± 10% -13.0% 2.204e+10 ± 7% cpuidle..time 26431366 ± 10% -13.2% 22948978 ± 7% cpuidle..usage 0.15 ± 4% -0.0 0.12 ± 3% mpstat.cpu.all.soft% 2.92 ± 3% +0.4 3.32 ± 4% mpstat.cpu.all.sys% 2243 ± 2% -12.7% 1957 ± 3% uptime.boot 29811 ± 8% -11.1% 26507 ± 6% uptime.idle 5.32 ± 79% -64.2% 1.91 ± 60% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault 2.70 ± 18% +37.8% 3.72 ± 9% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select 0.64 ±137% +26644.2% 169.91 ±220% perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode 0.08 ± 20% +0.0 0.12 ± 10% perf-profile.children.cycles-pp.terminate_walk 0.10 ± 25% +0.0 0.14 ± 10% perf-profile.children.cycles-pp.wake_up_q 0.06 ± 50% +0.0 0.10 ± 10% perf-profile.children.cycles-pp.vfs_readlink 0.15 ± 36% +0.1 0.22 ± 13% perf-profile.children.cycles-pp.readlink 1.31 ± 19% +0.4 1.69 ± 12% perf-profile.children.cycles-pp.unmap_vmas 2.46 ± 19% +0.5 2.99 ± 4% perf-profile.children.cycles-pp.exit_mmap 311653 ± 10% -23.7% 237884 ± 9% turbostat.C1E 26018024 ± 10% -13.1% 22597563 ± 7% turbostat.C6 6.41 ± 9% -13.6% 5.54 ± 8% turbostat.CPU%c1 2.47 ± 11% +36.0% 3.36 ± 6% turbostat.CPU%c6 2.881e+08 ± 2% -12.8% 2.513e+08 ± 3% turbostat.IRQ 212.86 +2.8% 218.84 turbostat.RAMWatt 341.49 -4.1% 327.42 ± 2% autonuma-benchmark.numa01.seconds 186.67 ± 6% -27.1% 136.12 ± 7% autonuma-benchmark.numa01_THREAD_ALLOC.seconds 21.17 ± 7% -33.6% 14.05 autonuma-benchmark.numa02.seconds 2200 ± 2% -13.0% 1913 ± 3% autonuma-benchmark.time.elapsed_time 2200 ± 2% -13.0% 1913 ± 3% autonuma-benchmark.time.elapsed_time.max 1159380 ± 2% -12.0% 1019969 ± 3% autonuma-benchmark.time.involuntary_context_switches 3363550 -5.0% 3194802 autonuma-benchmark.time.minor_page_faults 243046 ± 2% -13.3% 210725 ± 3% autonuma-benchmark.time.user_time 7494239 -6.8% 6984234 proc-vmstat.numa_hit 118829 ± 6% +13.7% 135136 ± 6% proc-vmstat.numa_huge_pte_updates 6207618 -8.4% 5686795 ± 2% proc-vmstat.numa_local 8834573 ± 3% +20.2% 10616944 ± 4% proc-vmstat.numa_pages_migrated 61094857 ± 6% +13.6% 69409875 ± 6% proc-vmstat.numa_pte_updates 8602789 -9.0% 7827793 ± 2% proc-vmstat.pgfault 8834573 ± 3% +20.2% 10616944 ± 4% proc-vmstat.pgmigrate_success 371818 -10.1% 334391 ± 2% proc-vmstat.pgreuse 17200 ± 3% +20.3% 20686 ± 4% proc-vmstat.thp_migration_success 16401792 ± 2% -12.7% 14322816 ± 3% proc-vmstat.unevictable_pgs_scanned 1.606e+08 ± 2% -13.8% 1.385e+08 ± 3% sched_debug.cfs_rq:/.avg_vruntime.avg 1.666e+08 ± 2% -14.0% 1.433e+08 ± 3% sched_debug.cfs_rq:/.avg_vruntime.max 1.364e+08 ± 2% -11.7% 1.204e+08 ± 3% sched_debug.cfs_rq:/.avg_vruntime.min 4795327 ± 7% -17.5% 3956991 ± 7% sched_debug.cfs_rq:/.avg_vruntime.stddev 1.606e+08 ± 2% -13.8% 1.385e+08 ± 3% sched_debug.cfs_rq:/.min_vruntime.avg 1.666e+08 ± 2% -14.0% 1.433e+08 ± 3% sched_debug.cfs_rq:/.min_vruntime.max 1.364e+08 ± 2% -11.7% 1.204e+08 ± 3% sched_debug.cfs_rq:/.min_vruntime.min 4795327 ± 7% -17.5% 3956991 ± 7% sched_debug.cfs_rq:/.min_vruntime.stddev 364.96 ± 6% +16.6% 425.70 ± 5% sched_debug.cfs_rq:/.util_est_enqueued.avg 1099114 -13.0% 956021 ± 2% sched_debug.cpu.clock.avg 1099477 -13.0% 956344 ± 2% sched_debug.cpu.clock.max 1098702 -13.0% 955643 ± 2% sched_debug.cpu.clock.min 1080712 -13.0% 940415 ± 2% sched_debug.cpu.clock_task.avg 1085309 -13.1% 943557 ± 2% sched_debug.cpu.clock_task.max 1064613 -13.0% 925993 ± 2% sched_debug.cpu.clock_task.min 28890 ± 3% -11.7% 25504 ± 3% sched_debug.cpu.curr->pid.avg 35200 -11.0% 31344 sched_debug.cpu.curr->pid.max 862245 ± 3% -8.7% 786984 sched_debug.cpu.max_idle_balance_cost.max 74019 ± 9% -28.2% 53158 ± 7% sched_debug.cpu.max_idle_balance_cost.stddev 15507 -11.9% 13667 ± 2% sched_debug.cpu.nr_switches.avg 57616 ± 6% -19.0% 46642 ± 8% sched_debug.cpu.nr_switches.max 8460 ± 6% -12.9% 7368 ± 5% sched_debug.cpu.nr_switches.stddev 1098689 -13.0% 955631 ± 2% sched_debug.cpu_clk 1097964 -13.0% 954907 ± 2% sched_debug.ktime 0.00 +15.0% 0.00 ± 2% sched_debug.rt_rq:.rt_nr_migratory.avg 0.03 +15.0% 0.03 ± 2% sched_debug.rt_rq:.rt_nr_migratory.max 0.00 +15.0% 0.00 ± 2% sched_debug.rt_rq:.rt_nr_migratory.stddev 0.00 +15.0% 0.00 ± 2% sched_debug.rt_rq:.rt_nr_running.avg 0.03 +15.0% 0.03 ± 2% sched_debug.rt_rq:.rt_nr_running.max 0.00 +15.0% 0.00 ± 2% sched_debug.rt_rq:.rt_nr_running.stddev 1099511 -13.0% 956501 ± 2% sched_debug.sched_clk 1162 ± 2% +15.2% 1339 ± 3% perf-stat.i.MPKI 1.656e+08 +3.6% 1.716e+08 perf-stat.i.branch-instructions 0.95 ± 4% +0.1 1.03 perf-stat.i.branch-miss-rate% 1538367 ± 6% +11.0% 1707146 ± 2% perf-stat.i.branch-misses 6.327e+08 ± 3% +18.7% 7.513e+08 ± 4% perf-stat.i.cache-misses 8.282e+08 ± 2% +15.2% 9.542e+08 ± 3% perf-stat.i.cache-references 658.12 ± 3% -11.4% 582.98 ± 6% perf-stat.i.cycles-between-cache-misses 2.201e+08 +2.8% 2.263e+08 perf-stat.i.dTLB-loads 579771 +0.9% 584915 perf-stat.i.dTLB-store-misses 1.122e+08 +1.4% 1.138e+08 perf-stat.i.dTLB-stores 8.278e+08 +3.1% 8.538e+08 perf-stat.i.instructions 13.98 ± 2% +14.3% 15.98 ± 3% perf-stat.i.metric.M/sec 3797 +4.3% 3958 perf-stat.i.minor-faults 258749 +8.0% 279391 ± 2% perf-stat.i.node-load-misses 261169 ± 2% +7.4% 280417 ± 5% perf-stat.i.node-loads 40.91 ± 3% -3.0 37.89 ± 3% perf-stat.i.node-store-miss-rate% 3.841e+08 ± 6% +27.6% 4.902e+08 ± 7% perf-stat.i.node-stores 3797 +4.3% 3958 perf-stat.i.page-faults 998.24 ± 2% +11.8% 1116 ± 2% perf-stat.overall.MPKI 463.91 -3.2% 448.99 perf-stat.overall.cpi 604.23 ± 3% -15.9% 508.08 ± 4% perf-stat.overall.cycles-between-cache-misses 0.00 +3.3% 0.00 perf-stat.overall.ipc 39.20 ± 5% -4.5 34.70 ± 6% perf-stat.overall.node-store-miss-rate% 1.636e+08 +3.8% 1.698e+08 perf-stat.ps.branch-instructions 1499760 ± 6% +11.1% 1665855 ± 2% perf-stat.ps.branch-misses 6.296e+08 ± 3% +19.0% 7.489e+08 ± 4% perf-stat.ps.cache-misses 8.178e+08 ± 2% +15.5% 9.447e+08 ± 3% perf-stat.ps.cache-references 2.18e+08 +2.9% 2.244e+08 perf-stat.ps.dTLB-loads 578148 +0.9% 583328 perf-stat.ps.dTLB-store-misses 1.117e+08 +1.4% 1.132e+08 perf-stat.ps.dTLB-stores 8.192e+08 +3.3% 8.46e+08 perf-stat.ps.instructions 3744 +4.3% 3906 perf-stat.ps.minor-faults 255974 +8.2% 276924 ± 2% perf-stat.ps.node-load-misses 263796 ± 2% +7.7% 284110 ± 5% perf-stat.ps.node-loads 3.82e+08 ± 6% +27.7% 4.879e+08 ± 7% perf-stat.ps.node-stores 3744 +4.3% 3906 perf-stat.ps.page-faults 1.805e+12 ± 2% -10.1% 1.622e+12 ± 2% perf-stat.total.instructions Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki