Currently memory.min|low implementation requires the whole hierarchy has the settings, otherwise the protection will be broken. Our hierarchy is kind of like(memory.min value in brackets), root | docker(0) / \ c1(max) c2(0) Note that "docker" doesn't set memory.min. When kswapd runs, mem_cgroup_protected() returns "0" emin for "c1" due to "0" @parent_emin of "docker", as a result "c1" gets reclaimed. But it's hard to maintain parent's "memory.min" when there're uncertain protected children because only some important types of containers need the protection. Further, control tasks belonging to parent constantly reproduce trivial memory which should not be protected at all. It makes sense to ignore unprotected parent in this scenario to achieve the flexibility. In order not to break previous hierarchical behaviour, only ignore the parent when there's no protected ancestor upwards the hierarchy. Signed-off-by: Xunlei Pang <xlpang@xxxxxxxxxxxxxxxxx> --- include/linux/page_counter.h | 2 ++ mm/memcontrol.c | 5 +++++ mm/page_counter.c | 24 ++++++++++++++++++++++++ 3 files changed, 31 insertions(+) diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index bab7e57f659b..aed7ed28b458 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -55,6 +55,8 @@ bool page_counter_try_charge(struct page_counter *counter, void page_counter_uncharge(struct page_counter *counter, unsigned long nr_pages); void page_counter_set_min(struct page_counter *counter, unsigned long nr_pages); void page_counter_set_low(struct page_counter *counter, unsigned long nr_pages); +bool page_counter_has_min(struct page_counter *counter); +bool page_counter_has_low(struct page_counter *counter); int page_counter_set_max(struct page_counter *counter, unsigned long nr_pages); int page_counter_memparse(const char *buf, const char *max, unsigned long *nr_pages); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ca0bc6e6be13..f1dfa651f55d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5917,6 +5917,8 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, if (parent == root) goto exit; + if (!page_counter_has_min(&parent->memory)) + goto elow; parent_emin = READ_ONCE(parent->memory.emin); emin = min(emin, parent_emin); if (emin && parent_emin) { @@ -5931,6 +5933,9 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, siblings_min_usage); } +elow: + if (!page_counter_has_low(&parent->memory)) + goto exit; parent_elow = READ_ONCE(parent->memory.elow); elow = min(elow, parent_elow); if (elow && parent_elow) { diff --git a/mm/page_counter.c b/mm/page_counter.c index de31470655f6..8c668eae2af5 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -202,6 +202,30 @@ int page_counter_set_max(struct page_counter *counter, unsigned long nr_pages) } } +bool page_counter_has_min(struct page_counter *counter) +{ + struct page_counter *c; + + for (c = counter; c; c = c->parent) { + if (counter->min) + return true; + } + + return false; +} + +bool page_counter_has_low(struct page_counter *counter) +{ + struct page_counter *c; + + for (c = counter; c; c = c->parent) { + if (counter->low) + return true; + } + + return false; +} + /** * page_counter_set_min - set the amount of protected memory * @counter: counter -- 2.14.4.44.g2045bb6