Re: [PATCH v29 03/13] mm/damon: Adaptively adjust regions

SeongJae Park <sj38.park@xxxxxxxxx> · Tue, 25 May 2021 15:39:32 +0000

From: SeongJae Park <sjpark@xxxxxxxxx>

Hello Fernand,

Thank you for the questions!

On Tue, 25 May 2021 17:17:05 +0200 <sieberf@xxxxxxxxxx> wrote:

> Hi SeongJae,
> 
> The code looks good. Some questions for this patch:
> 
> The region merge threshold is computed on the access diff. Should the 
> diff threshold be exponential as diffs in low number of access are 
> likely to be more important? I.e if the threshold is 5, a region A with 
> 0 accesses will be merged with a region B with 4 accesses (diff=4), but 
> a region C with 50 access won't be merged with a region D with 60 
> accesses (diff=10), however it seems to me that keeping a good 
> granularity between A and B is more important than between C and D for 
> FPR. What do you think?

That totally makes sense if we have interest in only cold pages.  However,
DAMON is for more general use cases.  In some cases, people would have interest
in hot pages.  Using exponential diff might make the regions merging more
aggressive, and result in smaller overhead.  But, I think the amount of the
problem and benefit is unclear for now.  I was unable to find the overhead
becomes problematically high in my tests with production systems.  I think we
could add another option for this later, after we find it becomes a real
problem.

> 
> When the number of regions is less than half max region, region split 
> kicks in and doubles the number of region. This means that the number of 
> region will grow close to max region, then slowly decay as region 
> merges, until it reaches half max regions, then double again. This seems 
> to create a non-uniform region number distribution over time, with large 
> cycles. Also we do a lot of work when we double and no work otherwise. 
> Not sure what's the impact on measurement quality but intuitively seems 
> like keeping the number of regions constant over time would yield more 
> consistent metrics? How about we rather always split regions at each 
> iteration, and for each region we give a split probability?

Agreed, I think this makes sense.  I also planning to make the probability
adaptively changes based on current monitoring result, in future.
Nevertheless, I want to keep the logic as simple as possible for now, unless we
see clear problem and benefit there.

Thanks,
SeongJae Park

> 
> Kind regards,
> 
> --Fernand