[patch] oom: give current access to memory reserves if it has been killed

David Rientjes <rientjes@xxxxxxxxxx> · Mon, 29 Mar 2010 13:49:19 -0700 (PDT)

On Mon, 29 Mar 2010, Oleg Nesterov wrote:

> Can't comment, I do not understand these subtleties.
> 
> But I'd like to note that fatal_signal_pending() can be true when the
> process wasn't killed, but another thread does exit_group/exec.
> 

I'm not sure there's a difference between whether a process was oom killed 
and received a SIGKILL that way or whether exit_group(2) was used, so I 
don't think we need to test for (p->signal->flags & SIGNAL_GROUP_EXIT) 
here.

We do need to guarantee that exiting tasks always can get memory, which is 
the responsibility of setting TIF_MEMDIE.  The only thing this patch does 
is defer calling the oom killer when a task has a pending SIGKILL and then 
fail the allocation when it would otherwise repeat.  Instead of the 
considerable risk involved with no failing GFP_KERNEL allocations that are 
under PAGE_ALLOC_COSTLY_ORDER that is typically never done, it may make 
more sense to retry the allocation with TIF_MEMDIE on the second 
iteration: in essence, automatically selecting current for oom kill 
regardless of other oom killed tasks if it already has a pending SIGKILL.



oom: give current access to memory reserves if it has been killed

It's possible to livelock the page allocator if a thread has mm->mmap_sem and 
fails to make forward progress because the oom killer selects another thread 
sharing the same ->mm to kill that cannot exit until the semaphore is dropped.

The oom killer will not kill multiple tasks at the same time; each oom killed 
task must exit before another task may be killed.  Thus, if one thread is 
holding mm->mmap_sem and cannot allocate memory, all threads sharing the same 
->mm are blocked from exiting as well.  In the oom kill case, that means the
thread holding mm->mmap_sem will never free additional memory since it cannot
get access to memory reserves and the thread that depends on it with access to
memory reserves cannot exit because it cannot acquire the semaphore.  Thus,
the page allocators livelocks.

When the oom killer is called and current happens to have a pending SIGKILL,
this patch automatically selects it for kill so that it has access to memory
reserves and the better timeslice.  Upon returning to the page allocator, its
allocation will hopefully succeed so it can quickly exit and free its memory.

Cc: Mel Gorman <mel@xxxxxxxxx>
Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>
---
 mm/oom_kill.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -681,6 +681,16 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 	}
 
 	/*
+	 * If current has a pending SIGKILL, then automatically select it.  The
+	 * goal is to allow it to allocate so that it may quickly exit and free
+	 * its memory.
+	 */
+	if (fatal_signal_pending(current)) {
+		__oom_kill_task(current);
+		return;
+	}
+
+	/*
 	 * Check if there were limitations on the allocation (only relevant for
 	 * NUMA) that may require different handling.
 	 */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>