From: Kaiyang Zhao <kaiyang2@xxxxxxxxxx> Currently in Linux, there is no concept of fairness in memory tiering. Depending on the memory usage and access patterns of other colocated applications, an application cannot be sure of how much memory in which tier it will get, and how much its performance will suffer or benefit. Fairness is, however, important in a multi-tenant system. For example, an application may need to meet a certain tail latency requirement, which can be difficult to satisfy without x amount of frequently accessed pages in top-tier memory. Similarly, an application may want to declare a minimum throughput when running on a system for capacity planning purposes, but without fairness controls in memory tiering its throughput can fluctuate wildly as other applications come and go on the system. In this proposal, we amend the memory.low control in memcg to protect a cgroup’s memory usage in top-tier memory. A low protection for top-tier memory is scaled proportionally to the ratio of top-tier memory and total memory on the system. The protection is then applied to reclaim for top-tier memory. Promotion by NUMA balancing is also throttled through reduced scanning window when top-tier memory is contended and the cgroup is over its protection. Experiments we did with microbenchmarks exhibiting a range of memory access patterns and memory size confirmed that when top-tier memory is contended, the system moves towards a stable memory distribution where each cgroup’s memory usage in local DRAM converges to the protected amounts. One notable missing part in the patches is determining which NUMA nodes have top-tier memory; currently they use hardcoded node 0 for top-tier memory and node 1 for a CPU-less node backed by CXL memory. We’re working on removing this artifact and correctly applying to top-tier nodes in the system. Your feedback is greatly appreciated! Kaiyang Zhao (4): Add get_cgroup_local_usage for estimating the top-tier memory usage calculate memory.low for the local node and track its usage use memory.low local node protection for local node reclaim reduce NUMA balancing scan size of cgroups over their local memory.low include/linux/memcontrol.h | 25 ++++++++----- include/linux/page_counter.h | 16 ++++++--- kernel/sched/fair.c | 54 +++++++++++++++++++++++++--- mm/hugetlb_cgroup.c | 4 +-- mm/memcontrol.c | 68 ++++++++++++++++++++++++++++++------ mm/page_counter.c | 52 +++++++++++++++++++++------ mm/vmscan.c | 19 +++++++--- 7 files changed, 192 insertions(+), 46 deletions(-) -- 2.43.0