On Sun, 27 Aug 2023 00:37:27 +0000 SeongJae Park <sj@xxxxxxxxxx> wrote: > DAMON sleeps for sampling interval after each sampling, and check if > it's time for further doing aggregation and ops updating using > ktime_get_coarse_ts64() and baseline timestamps for the two periodic > operations. That's for making the operations occur at deterministic > timing. However, it turned out it could still result in indeterministic > and even not-that-intuitive results. > > After all, timer functions, and especially sleep functions that DAMON > uses to wait for specific timing, could contain some errors. Those > errors are legal, so no problem. However, depending on such legal > timing errors, the nr_accesses can be larger than aggregation interval > divided by sampling interval. For example, with the default setting (5 > ms sampling interval and 100 ms aggregation interval) we frequently show > regions having nr_accesses larger than 20. Also, if the execution of a > DAMOS scheme takes a long time, next aggregation could happen before > enough number of samples are collected. > > Since access check sampling is the smallest unit work of DAMON, using > the number of passed sampling intervals as the DAMON-internal timer can > easily avoid these problems. That is, convert aggregation and ops > update intervals to numbers of sampling intervals that need to be passed > before those operations be executed, count the number of passed sampling > intervals, and invoke the operations as soon as the specific amount of > sampling intervals passed. Make the change. > > Signed-off-by: SeongJae Park <sj@xxxxxxxxxx> > --- > include/linux/damon.h | 14 ++++++-- > mm/damon/core.c | 84 +++++++++++++++++++------------------------ > 2 files changed, 48 insertions(+), 50 deletions(-) > > diff --git a/include/linux/damon.h b/include/linux/damon.h > index ab3089de1478..9a32b8fd0bd3 100644 > --- a/include/linux/damon.h > +++ b/include/linux/damon.h > @@ -524,8 +524,18 @@ struct damon_ctx { > struct damon_attrs attrs; > > /* private: internal use only */ > - struct timespec64 last_aggregation; > - struct timespec64 last_ops_update; > + /* number of sample intervals that passed since this context started */ > + unsigned long passed_sample_intervals; > + /* > + * number of sample intervals that should be passed before next > + * aggregation > + */ > + unsigned long next_aggregation_sis; > + /* > + * number of sample intervals that should be passed before next ops > + * update > + */ > + unsigned long next_ops_update_sis; > > /* public: */ > struct task_struct *kdamond; > diff --git a/mm/damon/core.c b/mm/damon/core.c > index 988dc39e44b1..83af336bb0e6 100644 > --- a/mm/damon/core.c > +++ b/mm/damon/core.c > @@ -456,8 +456,11 @@ struct damon_ctx *damon_new_ctx(void) > ctx->attrs.aggr_interval = 100 * 1000; > ctx->attrs.ops_update_interval = 60 * 1000 * 1000; > > - ktime_get_coarse_ts64(&ctx->last_aggregation); > - ctx->last_ops_update = ctx->last_aggregation; > + ctx->passed_sample_intervals = 0; > + ctx->next_aggregation_sis = ctx->attrs.aggr_interval / > + ctx->attrs.sample_interval; > + ctx->next_ops_update_sis = ctx->attrs.ops_update_interval / > + ctx->attrs.sample_interval; > > mutex_init(&ctx->kdamond_lock); > > @@ -577,6 +580,9 @@ static void damon_update_monitoring_results(struct damon_ctx *ctx, > */ > int damon_set_attrs(struct damon_ctx *ctx, struct damon_attrs *attrs) > { > + unsigned long sample_interval; > + unsigned long remaining_interval_us; > + > if (attrs->min_nr_regions < 3) > return -EINVAL; > if (attrs->min_nr_regions > attrs->max_nr_regions) > @@ -584,6 +590,20 @@ int damon_set_attrs(struct damon_ctx *ctx, struct damon_attrs *attrs) > if (attrs->sample_interval > attrs->aggr_interval) > return -EINVAL; > > + sample_interval = attrs->sample_interval ? attrs->sample_interval : 1; > + > + /* adjust next_aggregation_sis */ > + remaining_interval_us = ctx->attrs.sample_interval * > + (ctx->next_aggregation_sis - ctx->passed_sample_intervals); > + ctx->next_aggregation_sis = ctx->passed_sample_intervals + > + remaining_interval_us / sample_interval; > + > + /* adjust next_ops_update_sis */ > + remaining_interval_us = ctx->attrs.sample_interval * > + (ctx->next_ops_update_sis - ctx->passed_sample_intervals); > + ctx->next_ops_update_sis = ctx->passed_sample_intervals + > + remaining_interval_us / sample_interval; > + > damon_update_monitoring_results(ctx, attrs); > ctx->attrs = *attrs; > return 0; > @@ -757,38 +777,6 @@ int damon_stop(struct damon_ctx **ctxs, int nr_ctxs) > return err; > } So, the new timer variables are initialized in damon_new_ctx() as the default (5ms sampling interval, 100ms aggregation interval, and 1s ops update interval) and adjusted in damon_set_attrs(). It means the first intervals will be the default ones always. I will make those initialized at the beginning of kdamond_fn() from the next spin. [...] > @@ -1436,6 +1412,8 @@ static int kdamond_fn(void *data) > sz_limit = damon_region_sz_limit(ctx); As mentioned abovely, the timer variables will be initialized around here (at the beggining of kdamond_fn(), before going into the loop), in the next spin. Thanks, SJ > > while (!kdamond_need_stop(ctx)) { > + unsigned long sample_interval; > + > if (kdamond_wait_activation(ctx)) > break; > > @@ -1446,11 +1424,17 @@ static int kdamond_fn(void *data) > break; > > kdamond_usleep(ctx->attrs.sample_interval); > + ctx->passed_sample_intervals++; > > if (ctx->ops.check_accesses) > max_nr_accesses = ctx->ops.check_accesses(ctx); > > - if (kdamond_aggregate_interval_passed(ctx)) { > + sample_interval = ctx->attrs.sample_interval ? > + ctx->attrs.sample_interval : 1; > + if (ctx->passed_sample_intervals == > + ctx->next_aggregation_sis) { > + ctx->next_aggregation_sis += > + ctx->attrs.aggr_interval / sample_interval; > kdamond_merge_regions(ctx, > max_nr_accesses / 10, > sz_limit); > @@ -1465,7 +1449,11 @@ static int kdamond_fn(void *data) > ctx->ops.reset_aggregated(ctx); > } > > - if (kdamond_need_update_operations(ctx)) { > + if (ctx->passed_sample_intervals == > + ctx->next_ops_update_sis) { > + ctx->next_ops_update_sis += > + ctx->attrs.ops_update_interval / > + sample_interval; > if (ctx->ops.update) > ctx->ops.update(ctx); > sz_limit = damon_region_sz_limit(ctx); > -- > 2.25.1 > >