Some applications use memory cgroup limits to scale their own memory needs. Reading of the immediate membership cgroup's memory.max is not sufficient because of possible ancestral limits. The application could traverse upwards to figure out the tightest limit but this would not work in cgroup namespace where the view of cgroup hierarchy is incomplete and the limit may apply from outer world. (cgroup v1 used memory.stat:hierarchical_memory_limit to report the value but there's no such counterpart in cgroup v2 memory.stat.) Introduce a new memcg attribute file that contains the effective value of memory limit for given cgroup (following cpuset.cpus.effective pattern). Signed-off-by: Jan Kratochvil (Azul) <jkratochvil@xxxxxxxx> [ mkoutny: rewrite commit message, split out memory.swap.max] Signed-off-by: Michal Koutný <mkoutny@xxxxxxxx> --- Documentation/admin-guide/cgroup-v2.rst | 6 ++++++ mm/memcontrol.c | 18 ++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 8fbb0519d556..988f26264054 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1293,6 +1293,12 @@ PAGE_SIZE multiple when read back. Caller could retry them differently, return into userspace as -ENOMEM or silently ignore in cases like disk readahead. + memory.max.effective + A read-only file that provides effective value of cgroup's hard usage + limit. It incorporates limits of all ancestors, even those not visible + in cgroupns. The value change in this file generates a file modified + event. + memory.reclaim A write-only nested-keyed file which exists for all cgroups. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7fad15b2290c..86bcec84fe7b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7065,6 +7065,19 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, return nbytes; } +static int memory_max_effective_show(struct seq_file *m, void *v) +{ + unsigned long memory; + struct mem_cgroup *mi; + + /* Hierarchical information */ + memory = PAGE_COUNTER_MAX; + for (mi = mem_cgroup_from_seq(m); mi; mi = parent_mem_cgroup(mi)) + memory = min(memory, READ_ONCE(mi->memory.max)); + + return seq_puts_memcg_tunable(m, memory); +} + /* * Note: don't forget to update the 'samples/cgroup/memcg_event_listener' * if any new events become available. @@ -7259,6 +7272,11 @@ static struct cftype memory_files[] = { .seq_show = memory_max_show, .write = memory_max_write, }, + { + .name = "max.effective", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = memory_max_effective_show, + }, { .name = "events", .flags = CFTYPE_NOT_ON_ROOT, -- 2.45.1