On Wed, 11 Mar 2020, Tetsuo Handa wrote: > >>> diff --git a/mm/vmscan.c b/mm/vmscan.c > >>> --- a/mm/vmscan.c > >>> +++ b/mm/vmscan.c > >>> @@ -2637,6 +2637,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > >>> unsigned long reclaimed; > >>> unsigned long scanned; > >>> > >>> + cond_resched(); > >>> + > >> > >> Is this safe for CONFIG_PREEMPTION case? If current thread has realtime priority, > >> can we guarantee that the OOM victim (well, the OOM reaper kernel thread rather > >> than the OOM victim ?) gets scheduled? > >> > > > > I think it's the best we can do that immediately solves the issue unless > > you have another idea in mind? > > "schedule_timeout_killable(1) outside of oom_lock" or "the OOM reaper grabs oom_lock > so that allocating threads guarantee that the OOM reaper gets scheduled" or "direct OOM > reaping so that allocating threads guarantee that some memory is reclaimed". > The cond_resched() here is needed if the iteration is lengthy depending on the number of descendant memcgs already. schedule_timeout_killable(1) does not make any guarantees that current will be scheduled after the victim or oom_reaper on UP systems. If you have an alternate patch to try, we can test it. But since this cond_resched() is needed anyway, I'm not sure it will change the result. > > > >>> switch (mem_cgroup_protected(target_memcg, memcg)) { > >>> case MEMCG_PROT_MIN: > >>> /* > >>> > >> > >