Re: [PATCH] mm/damon: Make the sampling more accurate

sj@xxxxxxxxxx · Fri, 18 Mar 2022 10:49:48 +0000

On Fri, 18 Mar 2022 18:01:19 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote:

> 
> On 3/18/2022 5:40 PM, sj@xxxxxxxxxx wrote:
> > Hi Baolin,
> > 
> > On Fri, 18 Mar 2022 17:23:13 +0800 Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote:
> > 
> >> When I try to sample the physical address with DAMON to migrate pages
> >> on tiered memory system, I found it will demote some cold regions mistakenly.
> >> Now we will choose an physical address in the region randomly, but if
> >> its corresponding page is not an online LRU page, we will ignore the
> >> accessing status in this cycle of sampling, and actually will be treated
> >> as a non-accessed region. Suppose a region including some non-LRU pages,
> >> it will be treated as a cold region with a high probability, and may be
> >> merged with adjacent cold regions, but there are some pages may be
> >> accessed we missed.
> >>
> >> So instead of ignoring the access status of this region if we did not find
> >> a valid page according to current sampling address, we can use last valid
> >> sampling address to help to make the sampling more accurate, then we can do
> >> a better decision.
> > 
> > Well...  Offlined pages are also a valid part of the memory region, so treating
> > those as not accessed and making the memory region containing the offlined
> > pages looks colder seems legal to me.  IOW, this approach could make memory
> > regions containing many non-online-LRU pages as hot.
> 
> IMO I don't think this is a problem, since if this region containing 
> many non-online-LRU pages is treated as hot, which means threre are aome 
> pages are hot, right? We can find them and promote them to fast memory 
> (or do other schemes). Meanwhile, for non-online-LRU pages, we can 
> filter them and do nothing for them, since we can not get a valid page 
> struct for them.

For some of DAMOS actions that you mentioned, that could make sense.  However,
that wouldn't make much sense for some other cases, especially for manual
DAMON-based access pattern profiling.

After all, we already have a mechanism for this case: adaptive regions
adjustment (or, regions split/merge).  That mechanism will eventually separate
out hot oneline-LRU pages in the memory regions.  Before the region is
adjusted, reporting the whole region as hot looks like a right result to me.
Of course, I admit that it could take too much time to converge to the optimal
regions, and there are many rooms for improvement of the regions adjustment
mechanism.  I think we should pursue the direction (improving the regions
adjustment mechanism).

FYI, I have some rough ideas for improving the mechanism including partitioning
regions into more than 2 sub-regions if we belive it is not making a good
progress.  Nevertheless, I'd like to first make a methodology for evaluating
current accuracy.  For that, I am planning to implement a page-granularity
access monitoring.

Thanks,
SJ

[...]