On Fri, Dec 27, 2019 at 07:43:53AM -0500, Yafang Shao wrote: > memory.{emin, elow} are set in mem_cgroup_protected(), and the values of > them won't be changed until next recalculation in this function. After > either or both of them are set, the next reclaimer to relcaim this memcg > may be a different reclaimer, e.g. this memcg is also the root memcg of > the new reclaimer, and then in mem_cgroup_protection() in get_scan_count() > the old values of them will be used to calculate scan count, that is not > proper. We should reset them to zero in this case. > > Here's an example of this issue. > > root_mem_cgroup > / > A memory.max=1024M memory.min=512M memory.current=800M > > Once kswapd is waked up, it will try to scan all MEMCGs, including > this A, and it will assign memory.emin of A with 512M. > After that, A may reach its hard limit(memory.max), and then it will > do memcg reclaim. Because A is the root of this reclaimer, so it will > not calculate its memory.emin. So the memory.emin is the old value > 512M, and then this old value will be used in > mem_cgroup_protection() in get_scan_count() to get the scan count. > That is not proper. > > Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim") > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > Cc: Chris Down <chris@xxxxxxxxxxxxxx> > Cc: Roman Gushchin <guro@xxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > --- > mm/memcontrol.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 601405b..bb3925d 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6287,8 +6287,17 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, > > if (!root) > root = root_mem_cgroup; > - if (memcg == root) > + if (memcg == root) { > + /* > + * Reset memory.(emin, elow) for reclaiming the memcg > + * itself. > + */ > + if (memcg != root_mem_cgroup) { > + memcg->memory.emin = 0; > + memcg->memory.elow = 0; > + } I'm sorry, that didn't bring it from scratch, but I doubt that zeroing effecting protection is correct. Imagine a simple config: a large cgroup subtree with memory.max set on the top level. Reaching this limit doesn't mean that all protection configuration inside the tree can be ignored. Instead we should respect memory.low/max set by a user on this level (look at the parent == root case), maybe clamped by memory.high/max. Thanks!