In the current memory.min design, the system is going to do OOM instead of reclaiming the reclaimable pages protected by memory.min if the system is lack of free memory. While under this condition, the OOM killer may kill the processes in the memcg protected by memory.min. This behavior is very weird. In order to make it more reasonable, I make some changes in the OOM killer. In this patch, the OOM killer will do two-round scan. It will skip the processes under memcg protection at the first scan, and if it can't kill any processes it will rescan all the processes. Regarding the overhead this change may takes, I don't think it will be a problem because this only happens under system memory pressure and the OOM killer can't find any proper victims which are not under memcg protection. Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> Cc: Roman Gushchin <guro@xxxxxx> Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx> Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Cc: Souptick Joarder <jrdr.linux@xxxxxxxxx> Cc: Yafang Shao <shaoyafang@xxxxxxxxxxxxxx> --- include/linux/memcontrol.h | 6 ++++++ mm/memcontrol.c | 16 ++++++++++++++++ mm/oom_kill.c | 23 +++++++++++++++++++++-- 3 files changed, 43 insertions(+), 2 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 44c4146..534fe92 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -337,6 +337,7 @@ static inline bool mem_cgroup_disabled(void) enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, struct mem_cgroup *memcg); +bool task_under_memcg_protection(struct task_struct *p); int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask, struct mem_cgroup **memcgp, @@ -813,6 +814,11 @@ static inline enum mem_cgroup_protection mem_cgroup_protected( return MEMCG_PROT_NONE; } +static inline bool task_under_memcg_protection(struct task_struct *p) +{ + return false; +} + static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask, struct mem_cgroup **memcgp, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index cdbb7a8..8df3d88 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6030,6 +6030,22 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, return MEMCG_PROT_NONE; } +bool task_under_memcg_protection(struct task_struct *p) +{ + struct mem_cgroup *memcg; + bool protected; + + rcu_read_lock(); + memcg = mem_cgroup_from_task(p); + if (memcg != root_mem_cgroup && memcg->memory.min) + protected = true; + else + protected = false; + rcu_read_unlock(); + + return protected; +} + /** * mem_cgroup_try_charge - try charging a page * @page: page to charge diff --git a/mm/oom_kill.c b/mm/oom_kill.c index eda2e2a..e9bdad3 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -367,12 +367,31 @@ static void select_bad_process(struct oom_control *oc) if (is_memcg_oom(oc)) mem_cgroup_scan_tasks(oc->memcg, oom_evaluate_task, oc); else { + bool memcg_check = false; + bool memcg_skip = false; struct task_struct *p; + int selected = 0; rcu_read_lock(); - for_each_process(p) - if (oom_evaluate_task(p, oc)) +retry: + for_each_process(p) { + if (!memcg_check && task_under_memcg_protection(p)) { + memcg_skip = true; + continue; + } + selected = oom_evaluate_task(p, oc); + if (selected) break; + } + + if (!selected) { + if (memcg_skip) { + if (!oc->chosen || oc->chosen == (void *)-1UL) { + memcg_check = true; + goto retry; + } + } + } rcu_read_unlock(); } } -- 1.8.3.1