On Tue 20-08-19 17:26:49, Yafang Shao wrote: > On Tue, Aug 20, 2019 at 5:17 PM Michal Hocko <mhocko@xxxxxxxx> wrote: [...] > > > As there's a memory.oom.group option to select killing all processes > > > in a memcg, why not introduce a memcg level memcg.oom.score_adj? > > > > Because the oom selection is process based as already mentioned. There > > was a long discussion about memcg based oom victim selection last year > > but no consensus has been achieved. > > > > > Then we can set different scores to different memcgs. > > > Because we always deploy lots of containers on a single host, when OOM > > > occurs it will better to prefer killing the low priority containers > > > (with higher memcg.oom.score_adj) first. > > > > How would you define low priority container with score_adj? > > > > For example, Container-A is high priority and Container-B is low priority. > When OOM killer happens we prefer to kill all processes in Container-B > and prevent Container-A from being killed. > So we set memroy.oom.score_adj with -1000 to Container-A and +1000 > to Container-B, both container with memory.oom.cgroup set. > When we set memroy.oom.score_adj to a container, all processes > belonging to this container will be set this value to their own > oom_score_adj. I hope you can see that this on/off mechanism doesn't scale and thus it is a dubious interface. Just think of mutlitple containers. -- Michal Hocko SUSE Labs