On Fri, Apr 24, 2020 at 06:21:03PM +0200, Michal Hocko wrote: > On Fri 24-04-20 11:10:13, Johannes Weiner wrote: > > On Fri, Apr 24, 2020 at 04:29:58PM +0200, Michal Hocko wrote: > > > On Fri 24-04-20 09:14:50, Johannes Weiner wrote: > > > > On Thu, Apr 23, 2020 at 02:16:29AM -0400, Yafang Shao wrote: > > > > > This patch is an improvement of a previous version[1], as the previous > > > > > version is not easy to understand. > > > > > This issue persists in the newest kernel, I have to resend the fix. As > > > > > the implementation is changed, I drop Roman's ack from the previous > > > > > version. > > > > > > > > Now that I understand the problem, I much prefer the previous version. > > > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > > index 745697906ce3..2bf91ae1e640 100644 > > > > --- a/mm/memcontrol.c > > > > +++ b/mm/memcontrol.c > > > > @@ -6332,8 +6332,19 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, > > > > > > > > if (!root) > > > > root = root_mem_cgroup; > > > > - if (memcg == root) > > > > + if (memcg == root) { > > > > + /* > > > > + * The cgroup is the reclaim root in this reclaim > > > > + * cycle, and therefore not protected. But it may have > > > > + * stale effective protection values from previous > > > > + * cycles in which it was not the reclaim root - for > > > > + * example, global reclaim followed by limit reclaim. > > > > + * Reset these values for mem_cgroup_protection(). > > > > + */ > > > > + memcg->memory.emin = 0; > > > > + memcg->memory.elow = 0; > > > > return MEMCG_PROT_NONE; > > > > + } > > > > > > Could you be more specific why you prefer this over the > > > mem_cgroup_protection which doesn't change the effective value? > > > Isn't it easier to simply ignore effective value for the reclaim roots? > > > > Because now both mem_cgroup_protection() and mem_cgroup_protected() > > have to know about the reclaim root semantics, instead of just the one > > central place. > > Yes this is true but it is also potentially overwriting the state with > a parallel reclaim which can lead to surprising results Checking in mem_cgroup_protection() doesn't avoid the fundamental race: root `- A (low=2G, elow=2G, max=3G) `- A1 (low=2G, elow=2G) If A does limit reclaim while global reclaim races, the memcg == root check in mem_cgroup_protection() will reliably calculate the "right" scan value for A, which has no pages, and the wrong scan value for A1 where the memory actually is. I'm okay with fixing the case where a really old left-over value is used by target reclaim. I don't see a point in special casing this one instance of a fundamental race condition at the expense of less robust code.