On Tue, Aug 20, 2019 at 10:01 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > On Tue, Aug 20, 2019 at 9:40 AM Roman Gushchin <guro@xxxxxx> wrote: > > > > On Tue, Aug 20, 2019 at 09:16:01AM +0800, Yafang Shao wrote: > > > On Tue, Aug 20, 2019 at 5:12 AM Roman Gushchin <guro@xxxxxx> wrote: > > > > > > > > On Sun, Aug 18, 2019 at 09:18:06PM -0400, Yafang Shao wrote: > > > > > In the current memory.min design, the system is going to do OOM instead > > > > > of reclaiming the reclaimable pages protected by memory.min if the > > > > > system is lack of free memory. While under this condition, the OOM > > > > > killer may kill the processes in the memcg protected by memory.min. > > > > > This behavior is very weird. > > > > > In order to make it more reasonable, I make some changes in the OOM > > > > > killer. In this patch, the OOM killer will do two-round scan. It will > > > > > skip the processes under memcg protection at the first scan, and if it > > > > > can't kill any processes it will rescan all the processes. > > > > > > > > > > Regarding the overhead this change may takes, I don't think it will be a > > > > > problem because this only happens under system memory pressure and > > > > > the OOM killer can't find any proper victims which are not under memcg > > > > > protection. > > > > > > > > Hi Yafang! > > > > > > > > The idea makes sense at the first glance, but actually I'm worried > > > > about mixing per-memcg and per-process characteristics. > > > > Actually, it raises many questions: > > > > 1) if we do respect memory.min, why not memory.low too? > > > > > > memroy.low is different with memory.min, as the OOM killer will not be > > > invoked when it is reached. > > > If memory.low should be considered as well, we can use > > > mem_cgroup_protected() here to repclace task_under_memcg_protection() > > > here. > > > > > > > 2) if the task is 200Gb large, does 10Mb memory protection make any > > > > difference? if so, why would we respect it? > > > > > > Same with above, only consider it when the proctecion is enabled. > > > > Right, but memory.min is a number, not a boolean flag. It defines > > how much memory is protected. You're using it as an on-off knob, > > which is sub-optimal from my point of view. > > > > I mean using mem_cgroup_protected(), sam with memory.min is > implementad in the global reclaim path. > > > > > > > > 3) if it works for global OOMs, why not memcg-level OOMs? > > > > > > memcg OOM is when the memory limit is reached and it can't find > > > something to relcaim in the memcg and have to kill processes in this > > > memcg. > > > That is different with global OOM, because the global OOM can chose > > > processes outside the memcg but the memcg OOM can't. > > > > Imagine the following hierarchy: > > / > > | > > A memory.max = 10G, memory.min = 2G > > / \ > > B C memory.min = 1G, memory.min = 0 > > > > Say, you have memcg OOM in A, why B's memory min is not respected? > > How it's different to the system-wide OOM? > > > > Ah, this should be considered as well. Thanks for pointing out. > > > > > > > > 4) if the task is prioritized to be killed by OOM (via oom_score_adj), > > > > why even small memory.protection prevents it completely? > > > > > > Would you pls. show me some examples that when we will set both > > > memory.min(meaning the porcesses in this memcg is very important) and > > > higher oom_score_adj(meaning the porcesses in this memcg is not > > > improtant at all) ? > > > Note that the memory.min don't know which processes is important, > > > while it only knows is if this process in this memcg. > > > > For instance, to prefer a specific process to be killed in case > > of memcg OOM. > > Also, memory.min can be used mostly to preserve the pagecache, > > and an OOM kill means nothing but some anon memory leak. > > In this case, it makes no sense to protect the leaked task. > > > > But actually what memory.min protected is the memory usage, instead of > pagecache, > e.g. if the anon memory is higher than memory.min, then memroy.min > can't protect file memory when swap is off. > > Even there is no anon memory leak, the OOM killer can also be invoked > due to excess use of memroy. > Plus, the memory.min can also protect the leaked anon memroy in > current implementation. > BTW, if there are two different memcgs open the same file, the memcg proection will not work if one memcg is protected while another memcg is not protected. But that may be a rare case. > > > > > > > 5) if there are two tasks similar in size and both protected, > > > > should we prefer one with the smaller protection? > > > > etc. > > > > > > Same with the answer in 1). > > > > So the problem is not that your patch is incorrect (or the idea is bad), > > but you're defining a new policy, which will be impossible or very hard > > to change further (as any other policy). > > > > So it's important to define it very well. Using the memory.min > > number as a binary flag for selecting tasks seems a bit limited. > > > > > > Thanks!