Patch "mm/damon/core: handle zero {aggregation,ops_update} intervals" has been added to the 6.6-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Mon, 18 Nov 2024 09:36:44 -0500

This is a note to let you know that I've just added the patch titled

    mm/damon/core: handle zero {aggregation,ops_update} intervals

to the 6.6-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     mm-damon-core-handle-zero-aggregation-ops_update-int.patch
and it can be found in the queue-6.6 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 5bd7fe8c7e9559fbf24792d78e6f8d3b40b62ed5
Author: SeongJae Park <sj@xxxxxxxxxx>
Date:   Thu Oct 31 11:37:56 2024 -0700

    mm/damon/core: handle zero {aggregation,ops_update} intervals
    
    [ Upstream commit 3488af0970445ff5532c7e8dc5e6456b877aee5e ]
    
    Patch series "mm/damon/core: fix handling of zero non-sampling intervals".
    
    DAMON's internal intervals accounting logic is not correctly handling
    non-sampling intervals of zero values for a wrong assumption.  This could
    cause unexpected monitoring behavior, and even result in infinite hang of
    DAMON sysfs interface user threads in case of zero aggregation interval.
    Fix those by updating the intervals accounting logic.  For details of the
    root case and solutions, please refer to commit messages of fixes.
    
    This patch (of 2):
    
    DAMON's logics to determine if this is the time to do aggregation and ops
    update assumes next_{aggregation,ops_update}_sis are always set larger
    than current passed_sample_intervals.  And therefore it further assumes
    continuously incrementing passed_sample_intervals every sampling interval
    will make it reaches to the next_{aggregation,ops_update}_sis in future.
    The logic therefore make the action and update
    next_{aggregation,ops_updaste}_sis only if passed_sample_intervals is same
    to the counts, respectively.
    
    If Aggregation interval or Ops update interval are zero, however,
    next_aggregation_sis or next_ops_update_sis are set same to current
    passed_sample_intervals, respectively.  And passed_sample_intervals is
    incremented before doing the next_{aggregation,ops_update}_sis check.
    Hence, passed_sample_intervals becomes larger than
    next_{aggregation,ops_update}_sis, and the logic says it is not the time
    to do the action and update next_{aggregation,ops_update}_sis forever,
    until an overflow happens.  In other words, DAMON stops doing aggregations
    or ops updates effectively forever, and users cannot get monitoring
    results.
    
    Based on the documents and the common sense, a reasonable behavior for
    such inputs is doing an aggregation and an ops update for every sampling
    interval.  Handle the case by removing the assumption.
    
    Note that this could incur particular real issue for DAMON sysfs interface
    users, in case of zero Aggregation interval.  When user starts DAMON with
    zero Aggregation interval and asks online DAMON parameter tuning via DAMON
    sysfs interface, the request is handled by the aggregation callback.
    Until the callback finishes the work, the user who requested the online
    tuning just waits.  Hence, the user will be stuck until the
    passed_sample_intervals overflows.
    
    Link: https://lkml.kernel.org/r/20241031183757.49610-1-sj@xxxxxxxxxx
    Link: https://lkml.kernel.org/r/20241031183757.49610-2-sj@xxxxxxxxxx
    Fixes: 4472edf63d66 ("mm/damon/core: use number of passed access sampling as a timer")
    Signed-off-by: SeongJae Park <sj@xxxxxxxxxx>
    Cc: <stable@xxxxxxxxxxxxxxx>    [6.7.x]
    Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/mm/damon/core.c b/mm/damon/core.c
index a29390fd55935..d0441e24a8ed5 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1454,7 +1454,7 @@ static int kdamond_fn(void *data)
 		if (ctx->ops.check_accesses)
 			max_nr_accesses = ctx->ops.check_accesses(ctx);
 
-		if (ctx->passed_sample_intervals == next_aggregation_sis) {
+		if (ctx->passed_sample_intervals >= next_aggregation_sis) {
 			kdamond_merge_regions(ctx,
 					max_nr_accesses / 10,
 					sz_limit);
@@ -1472,7 +1472,7 @@ static int kdamond_fn(void *data)
 
 		sample_interval = ctx->attrs.sample_interval ?
 			ctx->attrs.sample_interval : 1;
-		if (ctx->passed_sample_intervals == next_aggregation_sis) {
+		if (ctx->passed_sample_intervals >= next_aggregation_sis) {
 			ctx->next_aggregation_sis = next_aggregation_sis +
 				ctx->attrs.aggr_interval / sample_interval;
 
@@ -1482,7 +1482,7 @@ static int kdamond_fn(void *data)
 				ctx->ops.reset_aggregated(ctx);
 		}
 
-		if (ctx->passed_sample_intervals == next_ops_update_sis) {
+		if (ctx->passed_sample_intervals >= next_ops_update_sis) {
 			ctx->next_ops_update_sis = next_ops_update_sis +
 				ctx->attrs.ops_update_interval /
 				sample_interval;