On Fri, 18 Mar 2022 18:01:19 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > > On 3/18/2022 5:40 PM, sj@xxxxxxxxxx wrote: > > Hi Baolin, > > > > On Fri, 18 Mar 2022 17:23:13 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote: > > > >> When I try to sample the physical address with DAMON to migrate pages > >> on tiered memory system, I found it will demote some cold regions mistakenly. > >> Now we will choose an physical address in the region randomly, but if > >> its corresponding page is not an online LRU page, we will ignore the > >> accessing status in this cycle of sampling, and actually will be treated > >> as a non-accessed region. Suppose a region including some non-LRU pages, > >> it will be treated as a cold region with a high probability, and may be > >> merged with adjacent cold regions, but there are some pages may be > >> accessed we missed. > >> > >> So instead of ignoring the access status of this region if we did not find > >> a valid page according to current sampling address, we can use last valid > >> sampling address to help to make the sampling more accurate, then we can do > >> a better decision. > > > > Well... Offlined pages are also a valid part of the memory region, so treating > > those as not accessed and making the memory region containing the offlined > > pages looks colder seems legal to me. IOW, this approach could make memory > > regions containing many non-online-LRU pages as hot. > > IMO I don't think this is a problem, since if this region containing > many non-online-LRU pages is treated as hot, which means threre are aome > pages are hot, right? We can find them and promote them to fast memory > (or do other schemes). Meanwhile, for non-online-LRU pages, we can > filter them and do nothing for them, since we can not get a valid page > struct for them. For some of DAMOS actions that you mentioned, that could make sense. However, that wouldn't make much sense for some other cases, especially for manual DAMON-based access pattern profiling. After all, we already have a mechanism for this case: adaptive regions adjustment (or, regions split/merge). That mechanism will eventually separate out hot oneline-LRU pages in the memory regions. Before the region is adjusted, reporting the whole region as hot looks like a right result to me. Of course, I admit that it could take too much time to converge to the optimal regions, and there are many rooms for improvement of the regions adjustment mechanism. I think we should pursue the direction (improving the regions adjustment mechanism). FYI, I have some rough ideas for improving the mechanism including partitioning regions into more than 2 sub-regions if we belive it is not making a good progress. Nevertheless, I'd like to first make a methodology for evaluating current accuracy. For that, I am planning to implement a page-granularity access monitoring. Thanks, SJ [...]