On Wed, 14 Aug 2024 23:51:22 +0000 kaiyang2@xxxxxxxxxx wrote: > From: Kaiyang Zhao <kaiyang2@xxxxxxxxxx> > > The ability to observe the demotion and promotion decisions made by the > kernel on a per-cgroup basis is important for monitoring and tuning > containerized workloads on machines equipped with tiered memory. > > Different containers in the system may experience drastically different > memory tiering actions that cannot be distinguished from the global > counters alone. > > For example, a container running a workload that has a much hotter > memory accesses will likely see more promotions and fewer demotions, > potentially depriving a colocated container of top tier memory to such > an extent that its performance degrades unacceptably. > > For another example, some containers may exhibit longer periods between > data reuse, causing much more numa_hint_faults than numa_pages_migrated. > In this case, tuning hot_threshold_ms may be appropriate, but the signal > can easily be lost if only global counters are available. > > In the long term, we hope to introduce per-cgroup control of promotion > and demotion actions to implement memory placement policies in tiering. > > This patch set adds seven counters to memory.stat in a cgroup: > numa_pages_migrated, numa_pte_updates, numa_hint_faults, pgdemote_kswapd, > pgdemote_khugepaged, pgdemote_direct and pgpromote_success. pgdemote_* > and pgpromote_success are also available in memory.numa_stat. > > count_memcg_events_mm() is added to count multiple event occurrences at > once, and get_mem_cgroup_from_folio() is added because we need to get a > reference to the memcg of a folio before it's migrated to track > numa_pages_migrated. The accounting of PGDEMOTE_* is moved to > shrink_inactive_list() before being changed to per-cgroup. There appears to have been little reviewer interest in this one?