On 09.06.20 08:53, SeongJae Park wrote: > From: SeongJae Park <sjpark@xxxxxxxxx> > > In many cases, users might use DAMON for simple data access aware > memory management optimizations such as applying an operation scheme to > a memory region of a specific size having a specific access frequency > for a specific time. For example, "page out a memory region larger than > 100 MiB but having a low access frequency more than 10 minutes", or "Use > THP for a memory region larger than 2 MiB having a high access frequency > for more than 2 seconds". > > To minimize users from spending their time for implementation of such > simple data access monitoring-based operation schemes, this commit makes > DAMON to handle such schemes directly. With this commit, users can > simply specify their desired schemes to DAMON. What would be the alternative? How would a solution where these policies are handled by user space (or inside an application?) look like? > > Each of the schemes is composed with conditions for filtering of the > target memory regions and desired memory management action for the > target. Specifically, the format is:: > > <min/max size> <min/max access frequency> <min/max age> <action> > > The filtering conditions are size of memory region, number of accesses > to the region monitored by DAMON, and the age of the region. The age of > region is incremented periodically but reset when its addresses or > access frequency has significantly changed or the action of a scheme was > applied. For the action, current implementation supports only a few of > madvise() hints, ``MADV_WILLNEED``, ``MADV_COLD``, ``MADV_PAGEOUT``, > ``MADV_HUGEPAGE``, and ``MADV_NOHUGEPAGE``. I am missing some important information. Is this specified for *all* user space processes? Or how is this configured? What are examples? E.g., messing with ``MADV_HUGEPAGE`` vs. ``MADV_NOHUGEPAGE`` of random applications can change the behavior/break these applications. (e.g., if userfaultfd is getting used and the applciation explicitly sets MADV_NOHUGEPAGE). > > Signed-off-by: SeongJae Park <sjpark@xxxxxxxxx> > --- > include/linux/damon.h | 50 ++++++++++++++ > mm/damon.c | 149 ++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 199 insertions(+) > > diff --git a/include/linux/damon.h b/include/linux/damon.h > index 6a8ff2c63c2a..842a01e80c6e 100644 > --- a/include/linux/damon.h > +++ b/include/linux/damon.h > @@ -55,6 +55,52 @@ struct damon_task { > struct list_head list; > }; > > +/** > + * enum damos_action - Represents an action of a Data Access Monitoring-based > + * Operation Scheme. > + * > + * @DAMOS_WILLNEED: Call ``madvise()`` for the region with MADV_WILLNEED. > + * @DAMOS_COLD: Call ``madvise()`` for the region with MADV_COLD. > + * @DAMOS_PAGEOUT: Call ``madvise()`` for the region with MADV_PAGEOUT. > + * @DAMOS_HUGEPAGE: Call ``madvise()`` for the region with MADV_HUGEPAGE. > + * @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE. > + * @DAMOS_ACTION_LEN: Number of supported actions. > + */ > +enum damos_action { > + DAMOS_WILLNEED, > + DAMOS_COLD, > + DAMOS_PAGEOUT, > + DAMOS_HUGEPAGE, > + DAMOS_NOHUGEPAGE, > + DAMOS_ACTION_LEN, > +}; > + > +/** > + * struct damos - Represents a Data Access Monitoring-based Operation Scheme. > + * @min_sz_region: Minimum size of target regions. > + * @max_sz_region: Maximum size of target regions. > + * @min_nr_accesses: Minimum ``->nr_accesses`` of target regions. > + * @max_nr_accesses: Maximum ``->nr_accesses`` of target regions. > + * @min_age_region: Minimum age of target regions. > + * @max_age_region: Maximum age of target regions. > + * @action: &damo_action to be applied to the target regions. > + * @list: List head for siblings. > + * > + * For each aggregation interval, DAMON applies @action to monitoring target > + * regions fit in the condition and updates the statistics. Note that both > + * the minimums and the maximums are inclusive. > + */ > +struct damos { > + unsigned int min_sz_region; > + unsigned int max_sz_region; > + unsigned int min_nr_accesses; > + unsigned int max_nr_accesses; > + unsigned int min_age_region; > + unsigned int max_age_region; > + enum damos_action action; > + struct list_head list; > +}; > + > /** > * struct damon_ctx - Represents a context for each monitoring. This is the > * main interface that allows users to set the attributes and get the results > @@ -98,6 +144,7 @@ struct damon_task { > * @kdamond_lock. Accesses to other fields must be protected by themselves. > * > * @tasks_list: Head of monitoring target tasks (&damon_task) list. > + * @schemes_list: Head of schemes (&damos) list. > * > * @sample_cb: Called for each sampling interval. > * @aggregate_cb: Called for each aggregation interval. > @@ -128,6 +175,7 @@ struct damon_ctx { > struct mutex kdamond_lock; > > struct list_head tasks_list; /* 'damon_task' objects */ > + struct list_head schemes_list; /* 'damos' objects */ > > /* callbacks */ > void (*sample_cb)(struct damon_ctx *context); > @@ -138,6 +186,8 @@ int damon_set_pids(struct damon_ctx *ctx, int *pids, ssize_t nr_pids); > int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, > unsigned long aggr_int, unsigned long regions_update_int, > unsigned long min_nr_reg, unsigned long max_nr_reg); > +int damon_set_schemes(struct damon_ctx *ctx, > + struct damos **schemes, ssize_t nr_schemes); > int damon_set_recording(struct damon_ctx *ctx, > unsigned int rbuf_len, char *rfile_path); > int damon_start(struct damon_ctx *ctx); > diff --git a/mm/damon.c b/mm/damon.c > index 17ec5fcc1b96..1ec6fa3dd671 100644 > --- a/mm/damon.c > +++ b/mm/damon.c > @@ -22,6 +22,7 @@ > > #define CREATE_TRACE_POINTS > > +#include <asm-generic/mman-common.h> > #include <linux/damon.h> > #include <linux/debugfs.h> > #include <linux/delay.h> > @@ -67,6 +68,12 @@ > #define damon_for_each_task_safe(t, next, ctx) \ > list_for_each_entry_safe(t, next, &(ctx)->tasks_list, list) > > +#define damon_for_each_scheme(s, ctx) \ > + list_for_each_entry(s, &(ctx)->schemes_list, list) > + > +#define damon_for_each_scheme_safe(s, next, ctx) \ > + list_for_each_entry_safe(s, next, &(ctx)->schemes_list, list) > + > #define MAX_RECORD_BUFFER_LEN (4 * 1024 * 1024) > #define MAX_RFILE_PATH_LEN 256 > > @@ -181,6 +188,27 @@ static void damon_destroy_task(struct damon_task *t) > damon_free_task(t); > } > > +static void damon_add_scheme(struct damon_ctx *ctx, struct damos *s) > +{ > + list_add_tail(&s->list, &ctx->schemes_list); > +} > + > +static void damon_del_scheme(struct damos *s) > +{ > + list_del(&s->list); > +} > + > +static void damon_free_scheme(struct damos *s) > +{ > + kfree(s); > +} > + > +static void damon_destroy_scheme(struct damos *s) > +{ > + damon_del_scheme(s); > + damon_free_scheme(s); > +} > + > static unsigned int nr_damon_tasks(struct damon_ctx *ctx) > { > struct damon_task *t; > @@ -779,6 +807,101 @@ static void kdamond_reset_aggregated(struct damon_ctx *c) > } > } > > +#ifndef CONFIG_ADVISE_SYSCALLS > +static int damos_madvise(struct damon_task *task, struct damon_region *r, > + int behavior) > +{ > + return -EINVAL; > +} > +#else > +static int damos_madvise(struct damon_task *task, struct damon_region *r, > + int behavior) > +{ > + struct task_struct *t; > + struct mm_struct *mm; > + int ret = -ENOMEM; > + > + t = damon_get_task_struct(task); > + if (!t) > + goto out; > + mm = damon_get_mm(task); > + if (!mm) > + goto put_task_out; > + > + ret = do_madvise(t, mm, PAGE_ALIGN(r->vm_start), > + PAGE_ALIGN(r->vm_end - r->vm_start), behavior); > + mmput(mm); > +put_task_out: > + put_task_struct(t); > +out: > + return ret; > +} > +#endif /* CONFIG_ADVISE_SYSCALLS */ > + > +static int damos_do_action(struct damon_task *task, struct damon_region *r, > + enum damos_action action) > +{ > + int madv_action; > + > + switch (action) { > + case DAMOS_WILLNEED: > + madv_action = MADV_WILLNEED; > + break; > + case DAMOS_COLD: > + madv_action = MADV_COLD; > + break; > + case DAMOS_PAGEOUT: > + madv_action = MADV_PAGEOUT; > + break; > + case DAMOS_HUGEPAGE: > + madv_action = MADV_HUGEPAGE; > + break; > + case DAMOS_NOHUGEPAGE: > + madv_action = MADV_NOHUGEPAGE; > + break; > + default: > + pr_warn("Wrong action %d\n", action); > + return -EINVAL; > + } > + > + return damos_madvise(task, r, madv_action); > +} > + > +static void damon_do_apply_schemes(struct damon_ctx *c, struct damon_task *t, > + struct damon_region *r) > +{ > + struct damos *s; > + unsigned long sz; > + > + damon_for_each_scheme(s, c) { > + sz = r->vm_end - r->vm_start; > + if ((s->min_sz_region && sz < s->min_sz_region) || > + (s->max_sz_region && s->max_sz_region < sz)) > + continue; > + if ((s->min_nr_accesses && r->nr_accesses < s->min_nr_accesses) > + || (s->max_nr_accesses && > + s->max_nr_accesses < r->nr_accesses)) > + continue; > + if ((s->min_age_region && r->age < s->min_age_region) || > + (s->max_age_region && > + s->max_age_region < r->age)) > + continue; > + damos_do_action(t, r, s->action); > + r->age = 0; > + } > +} > + > +static void kdamond_apply_schemes(struct damon_ctx *c) > +{ > + struct damon_task *t; > + struct damon_region *r; > + > + damon_for_each_task(t, c) { > + damon_for_each_region(r, t) > + damon_do_apply_schemes(c, t, r); > + } > +} > + > #define sz_damon_region(r) (r->vm_end - r->vm_start) > > /* > @@ -1001,6 +1124,7 @@ static int kdamond_fn(void *data) > kdamond_merge_regions(ctx, max_nr_accesses / 10); > if (ctx->aggregate_cb) > ctx->aggregate_cb(ctx); > + kdamond_apply_schemes(ctx); > kdamond_reset_aggregated(ctx); > kdamond_split_regions(ctx); > } > @@ -1081,6 +1205,30 @@ int damon_stop(struct damon_ctx *ctx) > return -EPERM; > } > > +/** > + * damon_set_schemes() - Set data access monitoring based operation schemes. > + * @ctx: monitoring context > + * @schemes: array of the schemes > + * @nr_schemes: number of entries in @schemes > + * > + * This function should not be called while the kdamond of the context is > + * running. > + * > + * Return: 0 if success, or negative error code otherwise. > + */ > +int damon_set_schemes(struct damon_ctx *ctx, struct damos **schemes, > + ssize_t nr_schemes) > +{ > + struct damos *s, *next; > + ssize_t i; > + > + damon_for_each_scheme_safe(s, next, ctx) > + damon_destroy_scheme(s); > + for (i = 0; i < nr_schemes; i++) > + damon_add_scheme(ctx, schemes[i]); > + return 0; > +} > + > /** > * damon_set_pids() - Set monitoring target processes. > * @ctx: monitoring context > @@ -1525,6 +1673,7 @@ static int __init damon_init_user_ctx(void) > mutex_init(&ctx->kdamond_lock); > > INIT_LIST_HEAD(&ctx->tasks_list); > + INIT_LIST_HEAD(&ctx->schemes_list); > > return 0; > } > -- Thanks, David / dhildenb