On Sun, 6 Jun 2010 15:34:18 -0700 (PDT) David Rientjes <rientjes@xxxxxxxxxx> wrote: > It's possible to livelock the page allocator if a thread has mm->mmap_sem What is the state of this thread? Trying to allocate memory, I assume. > and fails to make forward progress because the oom killer selects another > thread sharing the same ->mm to kill that cannot exit until the semaphore > is dropped. > > The oom killer will not kill multiple tasks at the same time; each oom > killed task must exit before another task may be killed. This sounds like a quite risky design. The possibility that we'll cause other dead/livelocks similar to this one seems pretty high. It applies to all sleeping locks in the entire kernel, doesn't it? If so: it's unfortunate that the kernel doesn't dsitinguish between D-state-for-locks and D-state-for-disk-io. Otherwise we could just skip over D-state-for-locks processes. Or maybe I'm wrong ;) > Thus, if one > thread is holding mm->mmap_sem and cannot allocate memory, all threads > sharing the same ->mm are blocked from exiting as well. In the oom kill > case, that means the thread holding mm->mmap_sem will never free > additional memory since it cannot get access to memory reserves and the > thread that depends on it with access to memory reserves cannot exit > because it cannot acquire the semaphore. Thus, the page allocators > livelocks. > > When the oom killer is called and current happens to have a pending > SIGKILL, this patch automatically gives it access to memory reserves and > returns. Upon returning to the page allocator, its allocation will > hopefully succeed so it can quickly exit and free its memory. If not, the > page allocator will fail the allocation if it is not __GFP_NOFAIL. You said "hopefully". Does it actually work? Any real-world testing results? If so, they'd be a useful addition to the changelog. > Acked-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> > Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> > Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx> > --- > mm/oom_kill.c | 10 ++++++++++ > 1 files changed, 10 insertions(+), 0 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -650,6 +650,16 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, > /* Got some memory back in the last second. */ > return; > > + /* > + * If current has a pending SIGKILL, then automatically select it. The > + * goal is to allow it to allocate so that it may quickly exit and free > + * its memory. > + */ > + if (fatal_signal_pending(current)) { > + set_thread_flag(TIF_MEMDIE); > + return; > + } > + > if (sysctl_panic_on_oom == 2) { > dump_header(NULL, gfp_mask, order, NULL); > panic("out of memory. Compulsory panic_on_oom is selected.\n"); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>