Hi Yu, On Tue, 4 Jan 2022 13:22:19 -0700 Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > TLDR > ==== > The current page reclaim is too expensive in terms of CPU usage and it > often makes poor choices about what to evict. This patchset offers an > alternative solution that is performant, versatile and > straightforward. > [...] > Summery > ======= > The facts are: > 1. The independent lab results and the real-world applications > indicate substantial improvements; there are no known regressions. So impressive results! > 2. Thrashing prevention, working set estimation and proactive reclaim > work out of the box; there are no equivalent solutions. I think similar works are already available out of the box with the latest mainline tree, though it might be suboptimal in some cases. First, you can do thrashing prevention using DAMON-based Operation Scheme (DAMOS)[1] with MADV_COLD action. Second, for working set estimation, you can either use the DAMOS again with statistics action, or the damon_aggregated tracepoint[2]. The DAMON user space tool[3] helps the tracepoint analysis and visualization. Finally, for the proactive reclaim, you can again use the DAMOS with MADV_PAGEOUT action, or simply the DAMON-based proactive reclaim module (DAMON_RECLAIM)[4]. Nevertheless, as noted above, current DAMON based solutions might be suboptimal for some cases. First of all, DAMON currently doesn't provide page granularity monitoring. Though its monitoring results were useful for our users' production usages, there could be different requirements and situations. Secondly, the DAMON-based thrashing prevention wouldn't reduce the CPU usage of the reclamation logic's access scanning. So, to me, MGLRU patchset looks providing something that DAMON doesn't provide, but also something that DAMON is already providing. Specifically, the efficient page granularity access scanning is what DAMON doesn't provide for now. However, the utilization of the access information for LRU list manipulation (thrashing prevention) and proactive reclamation is similar to what DAMON (specifically, DAMOS) provides. Also, this patchset is reducing the reclamation logic's CPU usage using the efficient page granularity access scanning. IMHO, we might be able to reduce the duplicates by integrating MGLRU in DAMON. What I'm saying is, we could 1) introduce the efficient page granularity access scanning, 2) reduce the reclamation logic's CPU usage by making it to use the efficient page granularity access scanning, and 3) extend DAMON for page granularity monitoring with the efficient access sacanning[5]. Then, users could get the benefit of MGLRU by using DAMOS but setting it to use your efficient page granularity access scanning. To make it more simple, we can extend existing kernel logics to use DAMON in the way, or implement a new kernel module. Additional advantages of this approach would be 1) reducing the changes to the existing code, and 2) making the efficient page granularity access information be utilized for more general cases. Of course, the integration might not be so simple as seems to me now. We could put DAMON and MGLRU together as those are for now, and let users select what they really want. I think it's up to you. I didn't read this patchset thoroughly yet, so I might missing many things. If so, please feel free to let me know. [1] https://docs.kernel.org/admin-guide/mm/damon/usage.html#schemes [2] https://docs.kernel.org/admin-guide/mm/damon/usage.html#tracepoint-for-monitoring-results [3] https://github.com/awslabs/damo [4] https://docs.kernel.org/admin-guide/mm/damon/reclaim.html [5] https://docs.kernel.org/vm/damon/design.html#configurable-layers Thanks, SJ [...]