Re: linux-4.4-rc1: TIF_MEMDIE without SIGKILL pending?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Michal Hocko wrote:
> On Sun 22-11-15 21:13:22, Tetsuo Handa wrote:
> > I was updating kmallocwd in preparation for testing "[RFC 0/3] OOM detection
> > rework v2" patchset. I noticed an unexpected result with linux.git as of
> > 3ad5d7e06a96 .
> > 
> > The problem is that an OOM victim arrives at do_exit() with TIF_MEMDIE flag
> > set but without pending SIGKILL. Is this correct behavior?
> 
> Have a look at out_of_memory where we do:
>         /*
>          * If current has a pending SIGKILL or is exiting, then automatically
>          * select it.  The goal is to allow it to allocate so that it may
>          * quickly exit and free its memory.
>          *
>          * But don't select if current has already released its mm and cleared
>          * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur.
>          */
>         if (current->mm &&
>             (fatal_signal_pending(current) || task_will_free_mem(current))) {
>                 mark_oom_victim(current);
>                 return true;
>         }
> 
> So if the current was exiting already we are not killing it, we just give it
> access to memory reserves to expedite the exit. We do the same thing for the
> memcg case.

The result is the same even if I do

-	BUG_ON(test_thread_flag(TIF_MEMDIE) && !fatal_signal_pending(current));
+	BUG_ON(test_thread_flag(TIF_MEMDIE) && !fatal_signal_pending(current) && !task_will_free_mem(current));

. I think that task_will_free_mem() is always false because this BUG_ON()
is located before "exit_signals(tsk);  /* sets PF_EXITING */" line.

> 
> Why would that be an issue in the first place?

The real problem I care is TIF_MEMDIE livelock.

  MemAlloc: oom-tester4(11040) uninterruptible dying victim
  MemAlloc: oom-tester4(11045) gfp=0x242014a order=0 delay=10000 dying

I'm not talking about TIF_MEMDIE livelock in this thread. I'm just worrying
that below output (which is caused by an OOM victim arriving at do_exit()
with TIF_MEMDIE flag set but without pending SIGKILL) is a foretaste of
unnoticed problem.

  MemAlloc: oom-tester4(11520) uninterruptible victim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]