On Fri 25-05-18 20:46:21, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Fri 25-05-18 19:57:32, Tetsuo Handa wrote: > > > Michal Hocko wrote: > > > > What is wrong with the folliwing? should_reclaim_retry should be a > > > > natural reschedule point. PF_WQ_WORKER is a special case which needs a > > > > stronger rescheduling policy. Doing that unconditionally seems more > > > > straightforward than depending on a zone being a good candidate for a > > > > further reclaim. > > > > > > Where is schedule_timeout_uninterruptible(1) for !PF_KTHREAD threads? > > > > Re-read what I've said. > > Please show me as a complete patch. Then, I will test your patch. So how about we start as simple as the following? If we _really_ need to touch should_reclaim_retry then it should be done in a separate patch with some numbers/tracing data backing that story. --- >From fa3b165c34937bf628f2eee7e6f0483c611167d0 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@xxxxxxxx> Date: Mon, 28 May 2018 14:30:37 +0200 Subject: [PATCH] mm, oom: remove sleep from under oom_lock Tetsuo has pointed out that since 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3") we have a strong synchronization between the oom_killer and victim's exiting because both have to take the oom_lock. Therefore the original heuristic to sleep for a short time in out_of_memory doesn't serve the original purpose. Moreover Tetsuo has noticed that the short sleep can be more harmful than actually useful. Hammering the system with many processes can lead to a starvation when the task holding the oom_lock can block for a long time (minutes) and block any further progress because the oom_reaper depends on the oom_lock as well. Drop the short sleep from out_of_memory when we hold the lock. Keep the sleep when the trylock fails to throttle the concurrent OOM paths a bit. This should be solved in a more reasonable way (e.g. sleep proportional to the time spent in the active reclaiming etc.) but this is much more complex thing to achieve. This is a quick fixup to remove a stale code. Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> --- mm/oom_kill.c | 17 ++--------------- 1 file changed, 2 insertions(+), 15 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 565e7da55318..7f12e346e477 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1099,7 +1099,6 @@ bool out_of_memory(struct oom_control *oc) { unsigned long freed = 0; enum oom_constraint constraint = CONSTRAINT_NONE; - bool delay = false; /* if set, delay next allocation attempt */ if (oom_killer_disabled) return false; @@ -1149,10 +1148,8 @@ bool out_of_memory(struct oom_control *oc) return true; } - if (mem_cgroup_select_oom_victim(oc) && oom_kill_memcg_victim(oc)) { - delay = true; + if (mem_cgroup_select_oom_victim(oc) && oom_kill_memcg_victim(oc)) goto out; - } select_bad_process(oc); /* Found nothing?!?! Either we hang forever, or we panic. */ @@ -1160,20 +1157,10 @@ bool out_of_memory(struct oom_control *oc) dump_header(oc, NULL); panic("Out of memory and no killable processes...\n"); } - if (oc->chosen_task && oc->chosen_task != INFLIGHT_VICTIM) { + if (oc->chosen_task && oc->chosen_task != INFLIGHT_VICTIM) oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : "Memory cgroup out of memory"); - delay = true; - } - out: - /* - * Give the killed process a good chance to exit before trying - * to allocate memory again. - */ - if (delay) - schedule_timeout_killable(1); - return !!oc->chosen_task; } -- 2.17.0 -- Michal Hocko SUSE Labs