From: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> Tiered memory accounting and management ------------------------------------------------------------ Traditionally, all RAM is DRAM. Some DRAM might be closer/faster than others, but a byte of media has about the same cost whether it is close or far. But, with new memory tiers such as High-Bandwidth Memory or Persistent Memory, there is a choice between fast/expensive and slow/cheap. But, the current memory cgroups still live in the old model. There is only one set of limits, and it implies that all memory has the same cost. We would like to extend memory cgroups to comprehend different memory tiers to give users a way to choose a mix between fast/expensive and slow/cheap. To manage such memory, we will need to account memory usage and impose limits for each kind of memory. There were a couple of approaches that have been discussed previously to partition the memory between the cgroups listed below. We will like to use the LSF/MM session to come to a consensus on the approach to take. 1. Per NUMA node limit and accounting for each cgroup. We can assign higher limits on better performing memory node for higher priority cgroups. There are some loose ends here that warrant further discussions: (1) A user friendly interface for such limits. Will a proportional weight for the cgroup that translate to actual absolute limit be more suitable? (2) Memory mis-configurations can occur more easily as the admin has a much larger number of limits spread among between the cgroups to manage. Over-restrictive limits can lead to under utilized and wasted memory and hurt performance. (3) OOM behavior when a cgroup hits its limit. 2. Per memory tier limit and accounting for each cgroup. We can assign higher limits on memories in better performing memory tier for higher priority cgroups. I previously prototyped a soft limit based implementation to demonstrate the tiered limit idea. There are also a number of issues here: (1) The advantage is we have fewer limits to deal with simplifying configuration. However, there are doubts raised by a number of people on whether we can really properly classify the NUMA nodes into memory tiers. There could still be significant performance differences between NUMA nodes even for the same kind of memory. We will also not have the fine-grained control and flexibility that comes with a per NUMA node limit. (2) Will a memory hierarchy defined by promotion/demotion relationship between memory nodes be a viable approach for defining memory tiers? These issues related to the management of systems with multiple kind of memories can be ironed out in this session.