This series attempts to address shortages in today's approach for memcg stats flushing, namely occasionally stale or expensive stat reads. The series does so by changing the threshold that we use to decide whether to trigger a flush to be per memcg instead of global (patch 3), and then changing flushing to be per memcg (i.e. subtree flushes) instead of global (patch 5). Patch 3 & 5 are the core of the series, and they include more details and testing results. The rest are either cleanups or prep work. This series replaces the "memcg: more sophisticated stats flushing" series [1], which also replaces another series, in a long list of attempts to improve memcg stats flushing. It is not a new version of the same patchset as it is a completely different approach. This is based on collected feedback from discussions on lkml in all previous attempts. Hopefully, this is the final attempt. There was a reported regression in v2 [2] for will-it-scale::fallocate benchmark. I believe this regression should not affect production workloads. This specific benchmark is allocating and freeing memory (using fallocate/ftruncate) at a rate that is much faster to make actual use of the memory. Testing this series on 100+ machines running production workloads did not show any practical regressions in page fault latency or allocation latency, but it showed great improvements in stats read time. I do not have numbers about the exact improvements for this series, but combined with another optimization for cgroup v1 [3] we see 5-10x improvements. A significant chunk of that is coming from the cgroup v1 optimization, but this series also made an improvement as reported by Domenico [4]. [1]https://lore.kernel.org/lkml/20230913073846.1528938-1-yosryahmed@xxxxxxxxxx/ [2]https://lore.kernel.org/lkml/202310202303.c68e7639-oliver.sang@xxxxxxxxx/ [3]https://lore.kernel.org/lkml/20230803185046.1385770-1-yosryahmed@xxxxxxxxxx/ [4]https://lore.kernel.org/lkml/CAFYChMv_kv_KXOMRkrmTN-7MrfgBHMcK3YXv0dPYEL7nK77e2A@xxxxxxxxxxxxxx/ v2 -> v3: - Rebased on top of v6.7-rc1. - Updated commit messages based on discussions in previous versions. - Reset percpu stats_updates in mem_cgroup_css_rstat_flush(). - Added a mem_cgroup_disabled() check to mem_cgroup_flush_stats(). v2: https://lore.kernel.org/lkml/20231010032117.1577496-1-yosryahmed@xxxxxxxxxx/ Yosry Ahmed (5): mm: memcg: change flush_next_time to flush_last_time mm: memcg: move vmstats structs definition above flushing code mm: memcg: make stats flushing threshold per-memcg mm: workingset: move the stats flush into workingset_test_recent() mm: memcg: restore subtree stats flushing include/linux/memcontrol.h | 8 +- mm/memcontrol.c | 272 +++++++++++++++++++++---------------- mm/vmscan.c | 2 +- mm/workingset.c | 42 ++++-- 4 files changed, 188 insertions(+), 136 deletions(-) -- 2.43.0.rc0.421.g78406f8d94-goog