Hello, There were a few attempts to use DAMON at tiered memory management, from the pretty early days of DAMON[1]. I also wanted to dive deep on this topic[2] but didn't get time for this, unfortunately. Meanwhile, a few people thankfully shared their own approaches and results with me, sometimes in public, and sometimes in private. They used varying approaches and the results were also very different. Some folks achieved nice results, while it was only waste of time for someone. Nonetheless, what I commonly heard about from such grateful sharing was the difficulty of DAMON tuning[3,4]. My proposal for the tuning difficulty wa auto-tuning DAMOS aggressiveness based on feedbacks, which could be provided by user, or DAMON can collect on its own. It is still not done, but an early version of the implementation has recently shared. I'd like to share my humble concept level idea of using the auto-tuning feature for somewhat reasonable or at least worthy to try DAMOS-based tiered memory management. Background ========== Please read DAMOS auto-tuning RFC patchset[1]'s coverletter and the first patch for the overall concept and the implementation detail. In short, the feature allow users define aimed system status (e.g., 0.05% memory PSI and/or 50% free memory ratios), and let DAMOS control its aggressiveness on its own to achieve the goals. If it is under-achieving the goals, DAMOS increases its aggressiveness, and vice versa. Under the limited aggressiveness, DAMOS applies the action to pages that more prioritized for the action. For example, colder pages are more prioritized for 'pageout' action. DAMOS-based Tiered-Memory Management Idea ========================================= The idea is to set DAMOS schemes for each tiered memory node, like below. 1. If the node has a lower node, demote cold pages of the node to the lower node using DAMOS, colder pages first. Let DAMOS auto-tune the aggressiveness of the demotion scheme aiming small amount of (e.g., 5%) free memory of the current node. 2. If the node has a higher node, promote hot pages in the node to the upper node using DAMOS, more hot pages first. Let DAMOS auto-tune the aggressiveness of the promotion scheme aiming high utilization rate (e.g., 96%) of the upper node. The aims of the demotion scheme and the promotion scheme are set to conflict a little bit, like above example. Discussion ========== The simple scheme can be easily extended to multiple tiered memory nodes. Higher node will keep high utilization with relatively hot pages, while lower nodes will have only relatively cold pages that cannot be accommodated in higher nodes due to out of the space. Because the utilization goal and the free memory goal overlap, DAMOS will continue moving cold pages down and hot pages up. Since the demotion is for coldest pages of the node, and the aggressiveness auto-tuning will make the aggressiveness under the small overlap small, the overhead will be only modest. If the system needs to being memory frugal, we may apply Access/Contiguity-aware Memory Auto-Scalaing (ACMA)[6] to the lowest node. Note that ACMA removes unnecessary memory blocks one by one and add those back as needed. As a result, the system will have only necessary nodes. Request For Comments ==================== This is in very early stage. No enough survey of related works is done, and no implementation of the demote/promote DAMOS action are made. An early version of the aim-oriented DAMOS aggressive auto tuning is available, though. I even have no good test setup for tiered memory system. That said, I hope to get any comments or concern about this humble early idea if someone have. That's not to only make it success as is, but rather to learn from you and develop it together, or even fail fast. Example Operation Scenario ========================== Let's suppose a system is having a DRAM and a CXL-based slower node, each can accommodate 10 pages. And the system is configured as the idea is proposed above. The free memory ratio and memory utilization ratio goals are set as 5% and 96%. That is, we have the demote DAMOS scheme for DRAM node, and promote DAMOS scheme for CXL node. Let's also represent the hotness of each page in five level, from 0 (cold) to 4 (hot). And let's represent free pages as 'F'. Then a state of the system may represented like below: Node 0 (DRAM): 43210 43210 (100% util, 0% free) Node 1 (CXL): FFFFF FFFFF ( 0% util, 100% free) Cold Pages-only Pingpong ------------------------ Since DRAM node is having 0% free memory ratio, which is under-achieving the goal of the demote scheme, demote scheme is activated. Meanwhile, since DRAM node utilization ratio is 100%, which is higher than the goal of the promote scheme (96%), the promote scheme does nothing. Hence, cold pages in Node 0 are demoted, colder one first. Node 0 (DRAM): 4321F 43210 ( 95% util, 5% free) Node 1 (CXL): FFFF0 FFFFF ( 5% util, 95% free) The goal of demote scheme is met, so demote scheme stops. The DRAM utilization ratio is below the promote scheme's goal (96%) now. So the scheme is activated and promote hot pages of the node, more hot one first. It promotes the coldest one in this case, though, since that's the only one page that can be promoted. Node 0 (DRAM): 43210 43210 (100% util, 0% free) Node 1 (CXL): FFFFF FFFFF ( 0% util, 100% free) Then promote scheme again deactivated, and demote scheme activated. Coldest page demoted. Node 0 (DRAM): 43210 4321F ( 95% util, 5% free) Node 1 (CXL): FFFFF FFFF0 ( 5% util, 95% free) Then promote scheme again activated, and demote scheme deactivated. The hottest page in CXL memory promoted. The hottest page is the colest one in CXL node this time again, though. Node 0 (DRAM): 43210 43210 (100% util, 0% free) Node 1 (CXL): FFFFF FFFFF ( 0% util, 100% free) In this way, DRAM node keeps high utilization ratio with only hot pages, while only cold pages move back and forth between the two nodes. Handling of Hotness Changed Pages --------------------------------- Let's assume the two cold pages are in CXL node. Node 0 (DRAM): 4321F 4321F ( 90% util, 10% free) Node 1 (CXL): FFFF0 FFFF0 ( 10% util, 90% free) And let's assume the demoted pages become hot. Node 0 (DRAM): 4321F 4321F ( 90% util, 10% free) Node 1 (CXL): FFFF3 FFFF2 ( 10% util, 90% free) The promotion scheme will promote the hottest page. Node 0 (DRAM): 43213 4321F ( 95% util, 5% free) Node 1 (CXL): FFFFF FFFF2 ( 5% util, 95% free) The other page in CXL also get promoted following the goals. Node 0 (DRAM): 43213 43212 (100% util, 0% free) Node 1 (CXL): FFFFF FFFFF ( 0% util, 100% free) Now, the demotion scheme demotes cold pages of the DRAM node, not the now-hot just promoted pages. Node 0 (DRAM): 432F3 43212 ( 95% util, 5% free) Node 1 (CXL): FFF1F FFFFF ( 5% util, 95% free) Unless the now-coldest pages be hot, only they will continue being promoted and demoted. Extension --------- The schemes can be extended for more than two nodes scenario. For example, below system can be configured. Let's assume CXL2 node is slower than CXL node. Node 0 (DRAM): 44444 4443F ( 95% util, 5% free) Node 1 (CXL): 33333 3321F ( 95% util, 5% free) Node 2 (CXL2): 11111 100FF ( 90% util, 19% free) A demote scheme for Node 0, which aims 5% free memory rate of Node 0, is configured. For Node 1, two schemes are made. One promote scheme which aims 96% utilization of Node 0, and one demote scheme which aims 5% free memory of Node 1. And Node 2 also has one promote scheme which aims 96% utilization of Node 1. In the way, higher nodes get high utilization with hot pages. If working set is enough to be accommodated in 95% of highest node, all working set will be placed in the node. [1] https://lore.kernel.org/linux-mm/cover.1640171137.git.baolin.wang@xxxxxxxxxxxxxxxxx/ [2] https://lore.kernel.org/linux-mm/20230105221109.53398-1-sj@xxxxxxxxxx/ [3] https://lore.kernel.org/linux-mm/20230219203138.4873-1-sj@xxxxxxxxxx/ [4] https://lwn.net/Articles/931812/ [5] https://lore.kernel.org/damon/20231112194607.61399-1-sj@xxxxxxxxxx/ [6] https://lore.kernel.org/damon/20231112195114.61474-1-sj@xxxxxxxxxx/