On Sun, Aug 18, 2019 at 9:55 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > In the current memory.min design, the system is going to do OOM instead > of reclaiming the reclaimable pages protected by memory.min if the > system is lack of free memory. While under this condition, the OOM > killer may kill the processes in the memcg protected by memory.min. > This behavior is very weird. > In order to make it more reasonable, I make some changes in the OOM > killer. In this patch, the OOM killer will do two-round scan. It will > skip the processes under memcg protection at the first scan, and if it > can't kill any processes it will rescan all the processes. > > Regarding the overhead this change may takes, I don't think it will be a > problem because this only happens under system memory pressure and > the OOM killer can't find any proper victims which are not under memcg > protection. > > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > Cc: Roman Gushchin <guro@xxxxxx> > Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx> > Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > --- > include/linux/memcontrol.h | 6 ++++++ > mm/memcontrol.c | 16 ++++++++++++++++ > mm/oom_kill.c | 23 +++++++++++++++++++++-- > 3 files changed, 43 insertions(+), 2 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 44c4146..58bd86b 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -337,6 +337,7 @@ static inline bool mem_cgroup_disabled(void) > > enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, > struct mem_cgroup *memcg); > +int task_under_memcg_protection(struct task_struct *p); > > int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, > gfp_t gfp_mask, struct mem_cgroup **memcgp, > @@ -813,6 +814,11 @@ static inline enum mem_cgroup_protection mem_cgroup_protected( > return MEMCG_PROT_NONE; > } > > +int task_under_memcg_protection(struct task_struct *p) > +{ > + return 0; > +} > + > static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, > gfp_t gfp_mask, > struct mem_cgroup **memcgp, > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index cdbb7a8..c4d8e53 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6030,6 +6030,22 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root, > return MEMCG_PROT_NONE; > } > > +int task_under_memcg_protection(struct task_struct *p) > +{ > + struct mem_cgroup *memcg; > + int protected; > + > + rcu_read_lock(); > + memcg = mem_cgroup_from_task(p); > + if (memcg != root_mem_cgroup && memcg->memory.min) > + protected = 1; > + else > + protected = 0; > + rcu_read_unlock(); > + > + return protected; I think returning a bool type would be more appropriate. > +} > + > /** > * mem_cgroup_try_charge - try charging a page > * @page: page to charge > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index eda2e2a..259dd2c 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -368,11 +368,30 @@ static void select_bad_process(struct oom_control *oc) > mem_cgroup_scan_tasks(oc->memcg, oom_evaluate_task, oc); > else { > struct task_struct *p; > + int memcg_check = 0; > + int memcg_skip = 0; > + int selected = 0; > > rcu_read_lock(); > - for_each_process(p) > - if (oom_evaluate_task(p, oc)) > +retry: > + for_each_process(p) { > + if (!memcg_check && task_under_memcg_protection(p)) { > + memcg_skip = 1; > + continue; > + } > + selected = oom_evaluate_task(p, oc); > + if (selected) > break; > + } > + > + if (!selected) { > + if (memcg_skip) { > + if (!oc->chosen || oc->chosen == (void *)-1UL) { > + memcg_check = 1; > + goto retry; > + } > + } > + } > rcu_read_unlock(); > } > } > -- > 1.8.3.1 > >