On 2020/03/12 4:38, David Rientjes wrote: > On Wed, 11 Mar 2020, Tetsuo Handa wrote: > >>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>>> --- a/mm/vmscan.c >>>>> +++ b/mm/vmscan.c >>>>> @@ -2637,6 +2637,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) >>>>> unsigned long reclaimed; >>>>> unsigned long scanned; >>>>> >>>>> + cond_resched(); >>>>> + >>>> >>>> Is this safe for CONFIG_PREEMPTION case? If current thread has realtime priority, >>>> can we guarantee that the OOM victim (well, the OOM reaper kernel thread rather >>>> than the OOM victim ?) gets scheduled? >>>> >>> >>> I think it's the best we can do that immediately solves the issue unless >>> you have another idea in mind? >> >> "schedule_timeout_killable(1) outside of oom_lock" or "the OOM reaper grabs oom_lock >> so that allocating threads guarantee that the OOM reaper gets scheduled" or "direct OOM >> reaping so that allocating threads guarantee that some memory is reclaimed". >> > > The cond_resched() here is needed if the iteration is lengthy depending on > the number of descendant memcgs already. No. cond_resched() here will become no-op if CONFIG_PREEMPTION=y and current thread has realtime priority. > > schedule_timeout_killable(1) does not make any guarantees that current > will be scheduled after the victim or oom_reaper on UP systems. The point of schedule_timeout_*(1) is to guarantee that current thread will yield CPU to other threads even if CONFIG_PREEMPTION=y and current thread has realtime priority case. There is no guarantee that current thread will be rescheduled immediately after a sleep is irrelevant. > > If you have an alternate patch to try, we can test it. But since this > cond_resched() is needed anyway, I'm not sure it will change the result. schedule_timeout_killable(1) is an alternate patch to try; I don't think that this cond_resched() is needed anyway. > >>> >>>>> switch (mem_cgroup_protected(target_memcg, memcg)) { >>>>> case MEMCG_PROT_MIN: >>>>> /* >>>>> >>>> >> >>