David Rientjes wrote: > On Sat, 14 Mar 2020, Tetsuo Handa wrote: > > If current thread is an OOM victim, schedule_timeout_killable(1) will give other > > threads (including the OOM reaper kernel thread) CPU time to run, by leaving > > try_charge() path due to should_force_charge() == true and reaching do_exit() path > > instead of returning to userspace code doing "for (;;);". > > > > Unless the problem is that current thread cannot reach should_force_charge() check, > > schedule_timeout_killable(1) should work. > > > > No need to yield if current is the oom victim, allowing the oom reaper to > run when it may not actually be able to free memory is not required. It > increases the likelihood that some other process schedules and is unable > to yield back due to the memcg oom condition such that the victim doesn't > get a chance to run again. > > This happens because the victim is allowed to overcharge but other > processes attached to an oom memcg hierarchy simply fail the charge. We > are then reliant on all memory chargers in the kernel to yield if their > charges fail due to oom. It's the only way to allow the victim to > eventually run. > > So the only change that I would make to your patch is to do this in > mem_cgroup_out_of_memory() instead: > > if (!fatal_signal_pending(current)) > schedule_timeout_killable(1); > > So we don't have this reliance on all other memory chargers to yield when > their charge fails and there is no delay for victims themselves. I see. You want below functions for environments where current thread can fail to resume execution for long if current thread once reschedules (e.g. UP kernel, many threads contended on one CPU). /* * Give other threads CPU time, unless current thread was already killed. * Used when we prefer killed threads to continue execution (in a hope that * killed threads terminate quickly) over giving other threads CPU time. */ signed long __sched schedule_timeout_killable_expedited(signed long timeout) { if (unlikely(fatal_signal_pending(current))) return timeout; return schedule_timeout_killable(timeout); } /* * Latency reduction via explicit rescheduling in places that are safe, * but becomes no-op if current thread was already killed. Used when we * prefer killed threads to continue execution (in a hope that killed * threads terminate quickly) over giving other threads CPU time. */ int cond_resched_expedited(void) { if (unlikely(fatal_signal_pending(current))) return 0; return cond_resched(); } > > [ I'll still propose my change that adds cond_resched() to > shrink_node_memcgs() because we can see need_resched set for a > prolonged period of time without scheduling. ] As long as there is schedule_timeout_killable(), I'm fine with adding cond_resched() in other places. > > If you agree, I'll propose your patch with a changelog that indicates it > can fix the soft lockup issue for UP and can likely get a tested-by for > it. > Please go ahead.