In the current memory.min design, the system is going to do OOM instead of reclaiming the reclaimable pages protected by memory.min if the system is lack of free memory. While under this condition, the OOM killer may kill the processes in the memcg protected by memory.min. This behavior is very weird. In order to make it more reasonable, I make some changes in the OOM killer. In this patch, the OOM killer will do two-round scan. It will skip the processes under memcg protection at the first scan, and if it can't kill any processes it will rescan all the processes. Regarding the overhead this change may takes, I don't think it will be a problem because this only happens under system memory pressure and the OOM killer can't find any proper victims which are not under memcg protection. Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> Cc: Roman Gushchin <guro@xxxxxx> Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx> Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> --- include/linux/memcontrol.h | 6 ++++++ mm/memcontrol.c | 16 ++++++++++++++++ mm/oom_kill.c | 23 +++++++++++++++++++++-- 3 files changed, 43 insertions(+), 2 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 44c4146..58bd86b 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -337,6 +337,7 @@ static inline bool mem_cgroup_disabled(void) enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, struct mem_cgroup *memcg); +int task_under_memcg_protection(struct task_struct *p); int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask, struct mem_cgroup **memcgp, @@ -813,6 +814,11 @@ static inline enum mem_cgroup_protection mem_cgroup_protected( return MEMCG_PROT_NONE; } +int task_under_memcg_protection(struct task_struct *p) +{ + return 0; +} + static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask, struct mem_cgroup **memcgp, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index cdbb7a8..c4d8e53 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6030,6 +6030,22 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, return MEMCG_PROT_NONE; } +int task_under_memcg_protection(struct task_struct *p) +{ + struct mem_cgroup *memcg; + int protected; + + rcu_read_lock(); + memcg = mem_cgroup_from_task(p); + if (memcg != root_mem_cgroup && memcg->memory.min) + protected = 1; + else + protected = 0; + rcu_read_unlock(); + + return protected; +} + /** * mem_cgroup_try_charge - try charging a page * @page: page to charge diff --git a/mm/oom_kill.c b/mm/oom_kill.c index eda2e2a..259dd2c 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -368,11 +368,30 @@ static void select_bad_process(struct oom_control *oc) mem_cgroup_scan_tasks(oc->memcg, oom_evaluate_task, oc); else { struct task_struct *p; + int memcg_check = 0; + int memcg_skip = 0; + int selected = 0; rcu_read_lock(); - for_each_process(p) - if (oom_evaluate_task(p, oc)) +retry: + for_each_process(p) { + if (!memcg_check && task_under_memcg_protection(p)) { + memcg_skip = 1; + continue; + } + selected = oom_evaluate_task(p, oc); + if (selected) break; + } + + if (!selected) { + if (memcg_skip) { + if (!oc->chosen || oc->chosen == (void *)-1UL) { + memcg_check = 1; + goto retry; + } + } + } rcu_read_unlock(); } } -- 1.8.3.1