On Wed, 26 Jun 2024 09:47:53 -0700 SeongJae Park <sj@xxxxxxxxxx> wrote: > DAMON's merge mechanism has two thresholds, namely those for access > frequency and size. The access frequency threshold avoids merging two > adjacent regions that having pretty different access frequency. > > The size threshold is calculated as total size of regions divided by > min_nr_regions. Merging operation skip merging two adjacent regions if > the resulting region's size can be larger than the threshold. This is > for meeting min_nr_regions. > > Commit 44fdaf596984 ("mm/damon/core: merge regions aggressively when > max_nr_regions is unmet") of mm-unstable, however, ignores the > min_nr_regions by increasing not only access frequency threshold but > also the size threshold. > > The commit also has one more problem. User could set DAMON target > regions with more than max_nr_regions discrete regions. Because DAMON > cannot merge non-adjacent regions, the number of regions will never be > lower than max_nr_regions regardless of the increased thresholds. As a > result, the function can infinitely repeat the loop. > > Increase only access frequency threshold, up to only possible maximum > value. > > Fixes: 44fdaf596984 ("mm/damon/core: merge regions aggressively when max_nr_regions is unmet") # mm-unstable > Signed-off-by: SeongJae Park <sj@xxxxxxxxxx> > --- > mm/damon/core.c | 19 ++++++++++--------- > 1 file changed, 10 insertions(+), 9 deletions(-) > > diff --git a/mm/damon/core.c b/mm/damon/core.c > index e6598c44b53c..dac27b949403 100644 > --- a/mm/damon/core.c > +++ b/mm/damon/core.c [...] > + max_thres = c->attrs.aggr_interval / > + (c->attrs.sample_interval ? c->attrs.sample_interval : 1); > do { > nr_regions = 0; > damon_for_each_target(t, c) { > @@ -1716,8 +1717,8 @@ static void kdamond_merge_regions(struct damon_ctx *c, unsigned int threshold, > nr_regions += damon_nr_regions(t); > } > threshold = max(1, threshold * 2); > - sz_limit = max(1, sz_limit * 2); > - } while (nr_regions > c->attrs.max_nr_regions); > + } while (nr_regions > c->attrs.max_nr_regions && > + threshold <= max_thres); This code means that kdamond_merge_regions() stops this repeated merge attempt if the merge threshold that increased for next attempt is higher than the possible maximum threshold. And because the increase of the threshold is made by picking a maximum value between one and the last-used threshold multiplying two, the merge attempt with maximum threshold will not be made unless both the maximum threshold and the threshold to increase are powers of two. In maximum situation (e.g., region 1 has 100% access frequency, region 2 has 0% access frequency, so on), this means the max_nr_regions violation cannot be recovered by the attempts. This can be fixed by changing it to stop repeated attempt if the last-used threshold is same to or higher than the maximum possible threshold, like below. I'll send the fix of the fix as a formal patch soon. FYI, the original fix is definitely better to be merged in stable kernels, but not urgent in my opinion, since the problematic case is not common and the behavior was same since the beginning of DAMON. Andrew, if you feel the original fix is not stable yet, please feel free to delay moving it to hotfix-stable for one week or two. Thanks, SJ ================================= >8 ========================================== --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -1773,7 +1773,7 @@ static void kdamond_merge_regions(struct damon_ctx *c, unsigned int threshold, } threshold = max(1, threshold * 2); } while (nr_regions > c->attrs.max_nr_regions && - threshold <= max_thres); + threshold / 2 < max_thres); } /*