Hello, On Fri, 9 Sep 2022 17:38:56 +0000 SeongJae Park <sj@xxxxxxxxxx> wrote: [...] > So, our next DAMON Beer/Coffee/Tea Chat series will be held in LPC2022, in > person. We had the in-person DAMON community meetup in last Wednesday, as announced. In the meeting, I met Alex, who recently posted the THP shrinker patch[1], and had a very interesting discussion about use of DAMON for his work. Leaving a summary of the discussion here. TL;DR: DAMON cannot be used for Alex' work as is. But, the goal of the work can be achieved using DAMON, though the internal mechanism would be slightly different. Also, with some works, DAMON can be directly used for Alex' work. The idea of Alex' work is to measure how many sub-pages in THPs are actually accessed, to know how much memory we are wasting due to THP-internal fragmentation, and split THPs having low utilization into regular pages. So imaginable DAMON ussage here would be using DAMON for the THP utilization measurement. Unfortunately, DAMON couldn't be used for the purpose for now, because current implementation of DAMON uses PTE Accessed bits. When a THP is collapsed, hence, DAMON will check the access to the THP in THP granularity, not in the page granularity. That said, we have an experimental implementation of DAMON-based THP improvement[2] which is integrated in DAMON performance tests suite[3]. It aims to achieve THP improvement that similar to Alex' one, though the detailed mechanism is slightly different from Alex' one. The idea of DAMON-based approach is to find >=2MB virtual memory regions showing high access frequency and do 'madvise(MADV_HUGEPAGE)' while finding memory regions showing no access for a time and do 'madvise(MADV_NOHUGEPAGE)', to reduce the memory footprint increase due to the THP internal fragmentation while keeping the performance improvement. So the main difference between Alex' work and the experimental DAMON-based approach is that Alex' work enables THP always first, then finds under-utilized THP and split those, while DAMON-based approach finds memory regions that could get benefit from THP and collapses those, while splitting THPs showing no performance benefit opportunity. According to the test results[4], DAMON-based THP improvement removes 80.3% of THP memory waste while preserving 30.79% of THP speedup. I'm planning to make a kernel module doing this work with a conservatively decided parameter values, and then automate the parameter tuning based on some system metrics. Time line is not clear at the moment, though. We can make the DAMON-based approach more similar to Alex' one by enabling THP always and using DAMON for splitting cold pages only. THPs being cold doesn't mean under-utilized, so still not strictly same to Alex' idea, but given the fact that one important goal of THP is the TLB miss reduction, splitting cold THPs would make some sense. There is still a way to use DAMON for Alex' approach in his idea, though some work is needed. DAMON cannot directly be used for Alex' work as is because it is using PTE Accessed bits based access check mechanism. But, DAMON allows multiple access check mechanism to be implemented and configured to be used by DAMON. Therefore, we can extend DAMON to use some access check mechanism that THP-independent and use that for Alex' work. For example, AMD's Instruction-Based Sampling[5] can be imagined. Because it check accesses in byte-granularity, should be THP independent and therefore able to be used for checking access to THP-internal sub-pages. Maybe Alex' THP sub-pages access check mechanism could also be used. If I'm missing something or saying wrong, please let me know. [1] https://lwn.net/Articles/906511/ [2] https://github.com/awslabs/damon-tests/tree/13d1850b79a2/perf/schemes/ethp.damos [3] https://github.com/awslabs/damon-tests/tree/13d1850b79a2/perf [4] https://damonitor.github.io/doc/html/v34-damos/vm/damon/eval.html#efficient-thp [5] https://developer.amd.com/wordpress/media/2012/10/AMD_IBS_paper_EN.pdf [...] > For people who cannot join in person there, I will schedule next virtual > instance of the chat series in the Monday of the LPC's next week. That is, the > next virtual instance of this chat series will be in > > 2022-09-19 18:00 PDT (https://meet.google.com/ndx-evoc-gbu) And, maybe too late but reminding you that next virtual instance of the chat series is today, 6PM in PDT as above. Thanks, SJ [..]