On Mon, Jan 30, 2023 at 12:15:09PM +0800, Matthew Wilcox wrote: > On Mon, Jan 30, 2023 at 10:32:56AM +0800, kernel test robot wrote: > > FYI, we noticed a -19.2% regression of unixbench.score due to commit: > > > > commit: f1a7941243c102a44e8847e3b94ff4ff3ec56f25 ("mm: convert mm's rss stats into percpu_counter") > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > in testcase: unixbench > > on test machine: 128 threads 4 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory > > with following parameters: > > > > runtime: 300s > > nr_task: 30% > > test: spawn > > cpufreq_governor: performance > > ... > > > 9cd6ffa60256e931 f1a7941243c102a44e8847e3b94 > > ---------------- --------------------------- > > %stddev %change %stddev > > \ | \ > > 11110 -19.2% 8974 unixbench.score > > 1090843 -12.2% 957314 unixbench.time.involuntary_context_switches > > 4243909 ± 6% -32.4% 2867136 ± 5% unixbench.time.major_page_faults > > 10547 -12.6% 9216 unixbench.time.maximum_resident_set_size > > 9.913e+08 -19.6% 7.969e+08 unixbench.time.minor_page_faults > > 5638 +19.1% 6714 unixbench.time.system_time > > 5502 -20.7% 4363 unixbench.time.user_time > > So we're spending a lot more time in the kernel and correspondingly less > time in userspace. > > > 67991885 -16.9% 56507507 unixbench.time.voluntary_context_switches > > 46198768 -19.1% 37355723 unixbench.workload > > 1.365e+08 -12.5% 1.195e+08 ± 7% cpuidle..usage > > 1220612 ± 4% -38.0% 757009 ± 28% meminfo.Active > > 1220354 ± 4% -38.0% 756754 ± 28% meminfo.Active(anon) > > 0.50 ± 2% -0.1 0.45 ± 4% mpstat.cpu.all.soft% > > 1.73 -0.2 1.52 ± 2% mpstat.cpu.all.usr% > > 532266 -18.4% 434559 vmstat.system.cs > > 495826 -12.2% 435455 ± 8% vmstat.system.in > > 1.36e+08 -13.2% 1.18e+08 ± 9% turbostat.C1 > > 68.80 +0.8 69.60 turbostat.C1% > > 1.663e+08 -12.1% 1.462e+08 ± 8% turbostat.IRQ > > 15.54 ± 20% -49.0% 7.93 ± 24% sched_debug.cfs_rq:/.runnable_avg.min > > 13.26 ± 19% -46.6% 7.08 ± 29% sched_debug.cfs_rq:/.util_avg.min > > 48.96 ± 8% +51.5% 74.20 ± 13% sched_debug.cfs_rq:/.util_est_enqueued.avg > > 138.00 ± 5% +28.9% 177.87 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.stddev > > 228060 ± 3% +13.3% 258413 ± 4% sched_debug.cpu.avg_idle.stddev > > 432533 ± 5% -16.4% 361517 ± 4% sched_debug.cpu.nr_switches.min > > 2.665e+08 -18.9% 2.162e+08 numa-numastat.node0.local_node > > 2.666e+08 -18.9% 2.163e+08 numa-numastat.node0.numa_hit > > 2.746e+08 -20.9% 2.172e+08 numa-numastat.node1.local_node > > 2.747e+08 -20.9% 2.172e+08 numa-numastat.node1.numa_hit > > 2.602e+08 -17.4% 2.149e+08 numa-numastat.node2.local_node > > 2.603e+08 -17.4% 2.149e+08 numa-numastat.node2.numa_hit > > 2.423e+08 -15.0% 2.06e+08 numa-numastat.node3.local_node > > 2.424e+08 -15.0% 2.061e+08 numa-numastat.node3.numa_hit > > So we're going off-node a lot more for ... something. > > > 2.666e+08 -18.9% 2.163e+08 numa-vmstat.node0.numa_hit > > 2.665e+08 -18.9% 2.162e+08 numa-vmstat.node0.numa_local > > 2.747e+08 -20.9% 2.172e+08 numa-vmstat.node1.numa_hit > > 2.746e+08 -20.9% 2.172e+08 numa-vmstat.node1.numa_local > > 2.603e+08 -17.4% 2.149e+08 numa-vmstat.node2.numa_hit > > 2.602e+08 -17.4% 2.149e+08 numa-vmstat.node2.numa_local > > 2.424e+08 -15.0% 2.061e+08 numa-vmstat.node3.numa_hit > > 2.423e+08 -15.0% 2.06e+08 numa-vmstat.node3.numa_local > > 304947 ± 4% -38.0% 189144 ± 28% proc-vmstat.nr_active_anon > > Umm. Are we running vmstat a lot during this test? The commit says: Thanks for the hint. Right. IIUC, 0Day normally will run 'vmstat 1' and collet info, and also read /proc/vmstat and /sys/devices/system/node/nodexx/numstat per second. > At the > moment the readers are either procfs interface, oom_killer and memory > reclaim which I think are not performance critical and should be ok with > slow read. However I think we can make that change in a separate patch. > > This would explain the increased cross-NUMA references (we're going to > the other nodes to collect the stats), and the general slowdown. But I > don't think it reflects a real workload; it's reflecting that the > monitoring of this workload that we're doing is now more accurate and > more expensive. We tried to disable the 3 types of 'vmstat/numastat' monitors mentioned above. And the regression is still there with similar rate around 19%. Thanks, Feng