On Mon, Apr 25, 2022 at 04:06:04PM +0200, Christoph Lameter wrote: > On Mon, 25 Apr 2022, Peter Zijlstra wrote: > > > > Folding the vmstat diffs *always* when entering idle prevents unnecessary > > > wakeups and processing in the future and also provides more accurate > > > counters for the VM allowing better decision to be made on reclaim. > > > > I'm thinking you're going to find a ton of regressions if you try it > > though; some workloads go idle *very* shortly, doing all this accounting > > is going to be counter-productive. > > Well there is usually not much to do in terms of accounting. static int refresh_cpu_vm_stats(bool do_pagesets) { struct pglist_data *pgdat; struct zone *zone; int i; int global_zone_diff[NR_VM_ZONE_STAT_ITEMS] = { 0, }; int global_node_diff[NR_VM_NODE_STAT_ITEMS] = { 0, }; int changes = 0; for_each_populated_zone(zone) { struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats; #ifdef CONFIG_NUMA struct per_cpu_pages __percpu *pcp = zone->per_cpu_pageset; #endif for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { int v; v = this_cpu_xchg(pzstats->vm_stat_diff[i], 0); if (v) { This loop is quite heavy. Maybe reducing the data necessary to be read to a couple of cachelines would improve it considerably. > If there are > a lot of updates then it is worthwhile because if the numbers are off too > much then the VM has trouble assessing its own situation. > > It may depend though on how long the idle periods are. Do we have > statistics on the duration? Always folding the vmstat deltas may also > increase the length of the idle periods. "Products such as the Intel® Optane™ SSD DC P4800X series have a read and write latency of 10 microseconds, compared with a write latency of about 220 microseconds for a typical NAND flash SSD."