On Wed, 14 Aug 2024 17:42:27 +0000 kaiyang2@xxxxxxxxxx wrote: > From: Kaiyang Zhao <kaiyang2@xxxxxxxxxx> > > The ability to observe the demotion and promotion decisions made by the > kernel on a per-cgroup basis is important for monitoring and tuning > containerized workloads on either NUMA machines or machines > equipped with tiered memory. > > Different containers in the system may experience drastically different > memory tiering actions that cannot be distinguished from the global > counters alone. > > For example, a container running a workload that has a much hotter > memory accesses will likely see more promotions and fewer demotions, > potentially depriving a colocated container of top tier memory to such > an extent that its performance degrades unacceptably. > > For another example, some containers may exhibit longer periods between > data reuse, causing much more numa_hint_faults than numa_pages_migrated. > In this case, tuning hot_threshold_ms may be appropriate, but the signal > can easily be lost if only global counters are available. > > This patch set adds seven counters to memory.stat in a cgroup: > numa_pages_migrated, numa_pte_updates, numa_hint_faults, pgdemote_kswapd, > pgdemote_khugepaged, pgdemote_direct and pgpromote_success. pgdemote_* > and pgpromote_success are also available in memory.numa_stat. > > count_memcg_events_mm() is added to count multiple event occurrences at > once, and get_mem_cgroup_from_folio() is added because we need to get a > reference to the memcg of a folio before it's migrated to track > numa_pages_migrated. The accounting of PGDEMOTE_* is moved to > shrink_inactive_list() before being changed to per-cgroup. > > ... > > @@ -1383,6 +1412,13 @@ static const struct memory_stat memory_stats[] = { > { "workingset_restore_anon", WORKINGSET_RESTORE_ANON }, > { "workingset_restore_file", WORKINGSET_RESTORE_FILE }, > { "workingset_nodereclaim", WORKINGSET_NODERECLAIM }, > + > + { "pgdemote_kswapd", PGDEMOTE_KSWAPD }, > + { "pgdemote_direct", PGDEMOTE_DIRECT }, > + { "pgdemote_khugepaged", PGDEMOTE_KHUGEPAGED }, > +#ifdef CONFIG_NUMA_BALANCING > + { "pgpromote_success", PGPROMOTE_SUCCESS }, > +#endif > }; Please document these in Documentation/admin-guide/cgroup-v2.rst