On Fri, 18 Sep 2015, Christoph Lameter wrote: > Subject: Allow multiple kills from the OOM killer > > The OOM killer currently aborts if it finds a process that already is having > access to the reserve memory pool for exit processing. This is done so that > the reserves are not overcommitted but on the other hand this also allows > only one process being oom killed at the time. That process may be stuck > in D state. > > Signed-off-by: Christoph Lameter <cl@xxxxxxxxx> > > Index: linux/mm/oom_kill.c > =================================================================== > --- linux.orig/mm/oom_kill.c 2015-09-18 11:58:52.963946782 -0500 > +++ linux/mm/oom_kill.c 2015-09-18 11:59:42.010684778 -0500 > @@ -264,10 +264,9 @@ enum oom_scan_t oom_scan_process_thread( > * This task already has access to memory reserves and is being killed. > * Don't allow any other task to have access to the reserves. > */ > - if (test_tsk_thread_flag(task, TIF_MEMDIE)) { > - if (oc->order != -1) > - return OOM_SCAN_ABORT; > - } > + if (test_tsk_thread_flag(task, TIF_MEMDIE)) > + return OOM_SCAN_CONTINUE; > + > if (!task->mm) > return OOM_SCAN_CONTINUE; > If this would result in the newly chosen process being guaranteed to exit, this would be fine. Unfortunately, no such guarantee is possible. If a thread is holding a contended mutex that the victim(s) require, this serial oom killer could eventually panic the system if that thread is OOM_DISABLE. The solution that we have merged internally is described at http://marc.info/?l=linux-kernel&m=144010444913702 -- we provide access to memory reserves to processes that find a stalled exit in the oom killer so that they may allocate. It comes along with a test module that takes a contended mutex and ensures that forward progress is made as long as memory reserves are not depleted. We can't actually guarantee that memory reserves won't be depleted, but we (1) hope that nobody is actually allocating a lot of memory before dropping a mutex and (2) want to avoid the alternative which is a system livelock. This will address situations such as allocator oom victim --------- ---------- mutex_lock(lock) alloc_pages(GFP_KERNEL) mutex_lock(lock) mutex_unlock(lock) handle SIGKILL since this otherwise results in a livelock without a solution such as mine since the GFP_KERNEL allocation stalls forever waiting for the oom victim to acquire the mutex and exit. This also works if the allocator is OOM_DISABLE. This won't handle other situations where the victim gets wedged in D state and is not allocating memory, but this is by far the more common occurrence that we have dealt with. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>