On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@xxxxxxxxx> wrote: > > Allow proactive reclaimers to submit an additional swappiness=<val> > argument to memory.reclaim. This overrides the global or per-memcg > swappiness setting for that reclaim attempt. > > For example: > > echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim > > will perform reclaim on the rootcg with a swappiness setting of 0 (no > swap) regardless of the vm.swappiness sysctl setting. > > Signed-off-by: Dan Schatzberg <schatzberg.dan@xxxxxxxxx> > --- > Documentation/admin-guide/cgroup-v2.rst | 15 ++++++- > include/linux/swap.h | 3 +- > mm/memcontrol.c | 55 ++++++++++++++++++++----- > mm/vmscan.c | 13 +++++- > 4 files changed, 70 insertions(+), 16 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 3f85254f3cef..fc2b379dbd0f 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1282,8 +1282,8 @@ PAGE_SIZE multiple when read back. > This is a simple interface to trigger memory reclaim in the > target cgroup. > > - This file accepts a single key, the number of bytes to reclaim. > - No nested keys are currently supported. > + This file accepts a string which containers thhe number of bytes contains* the* I think this statement was only important because no keys were supported, so I think we can remove it completely and rely on documenting the supported keys below like other interfaces, see my next comment. > + to reclaim. > > Example:: > > @@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back. > This means that the networking layer will not adapt based on > reclaim induced by memory.reclaim. > > + This file also allows the user to specify the swappiness value > + to be used for the reclaim. For example: > + > + echo "1G swappiness=60" > memory.reclaim > + > + The above instructs the kernel to perform the reclaim with > + a swappiness value of 60. Note that this has the same semantics > + as the vm.swappiness sysctl - it sets the relative IO cost of > + reclaiming anon vs file memory but does not allow for reclaiming > + specific amounts of anon or file memory. > + Can we instead follow the same format used by other nested-keyed files (e.g. io.max)? This usually involves a table of supported keys and such. > memory.peak > A read-only single value file which exists on non-root > cgroups. [..] > @@ -6902,12 +6913,33 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, > unsigned int nr_retries = MAX_RECLAIM_RETRIES; > unsigned long nr_to_reclaim, nr_reclaimed = 0; > unsigned int reclaim_options; > - int err; > + char *old_buf, *start; > + substring_t args[MAX_OPT_ARGS]; > + int swappiness = -1; > > buf = strstrip(buf); > - err = page_counter_memparse(buf, "", &nr_to_reclaim); > - if (err) > - return err; > + > + old_buf = buf; > + nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE; > + if (buf == old_buf) > + return -EINVAL; > + > + buf = strstrip(buf); > + > + while ((start = strsep(&buf, " ")) != NULL) { > + if (!strlen(start)) > + continue; > + switch (match_token(start, if_tokens, args)) { > + case MEMORY_RECLAIM_SWAPPINESS: > + if (match_int(&args[0], &swappiness)) > + return -EINVAL; > + if (swappiness < 0 || swappiness > 200) I am not a fan of extending the hardcoded 0 and 200 values, and now the new -1 value. Maybe it's time to create constants for the min and max swappiness values instead of hardcoding them everywhere? This can be a separate preparatory patch. Then, -1 (or any invalid value) can also be added as a constant with a useful name, instead of passing -1 to all other callers. This should make the code a little bit more readable and easier to extend.