Re: [PATCH] oom: always panic on OOM when panic_on_oom is configured

David Rientjes <rientjes@xxxxxxxxxx> · Wed, 10 Jun 2015 17:36:43 -0700 (PDT)

On Wed, 10 Jun 2015, Michal Hocko wrote:

> > Not necessarily.  We pin a lot of memory with get_user_pages() and 
> > short-circuit it by checking for fatal_signal_pending() specifically for 
> > oom conditions.  This was done over six years ago by commit 4779280d1ea4 
> > ("mm: make get_user_pages() interruptible").  When such a process is 
> > faulting in memory, and it is killed by userspace as a result of an oom 
> > condition, it needs to be able to allocate (TIF_MEMDIE set by the oom 
> > killer due to SIGKILL), return to __get_user_pages(), abort, handle the 
> > signal, and exit.
> > 
> > I can't possibly make that any more clear.
> 
> Are you even reading what I've written? I will ask for the last
> time. What exactly prevents other allocation to trigger to oom path and
> panic the system before the killed task has a chance to terminate?
> 

If there are other threads that call into the oom killer that are not in 
the exit path or have a SIGKILL to handle, then the machine panics.  
That's the purpose of panic_on_oom: the kernel has no way to free memory 
without killing a process, so the admin has chosen to panic rather than 
wait for memory to become available, which may never happen.

This is how panic_on_oom has always worked.

> > Your patch causes that to instead panic the system if panic_on_oom is set.  
> > It's inappropriate and userspace breakage.  The fact that I don't 
> > personally use panic_on_oom is completely and utterly irrelevant.
> > 
> > There is absolutely nothing wrong with a process that has been killed 
> > either directly by userspace or as part of a group exit, or a process that 
> > is already in the exit path and needs to allocate memory to be able to 
> > free its memory, to get access to memory reserves.  That's not an oom 
> > condition, that's memory reserves.  Panic_on_oom has nothing to do with 
> > this scenario whatsoever.
> 
> It very much has and I have presented arguments about that which you
> didn't bother to comment on. TIF_MEMDIE is not a magic which will help a
> task to exit in all cases. It is a heuristic and it can fail.
> panic_on_oops is a hand break when things go wrong and you want to
> reduce your unresponsive time (read failover part in the documentation).
> 

Threads that have been oom killed and have TIF_MEMDIE set should exit.  
It's certainly a problem if they do not, since the oom killer relies on it 
and will defer forever until it does exit.  (We don't actually require 
that the thread fully exit, we just require that its memory is freed.)  If 
you're trying to address the issue that Tetsuo Handa brought up (strange, 
because you seemed to not want Tetsuo to talk), then that needs to be 
handled in a way that makes forward progress.  I suggested three methods 
for doing that in this thread that can be pursued to do that, but 
panicking the system is not one of them.

> > Panic_on_oom is not panic_when_reclaim_fails. 
> 
> OOM is when all other reclaim attempts fail. Jeez we are in
> out_of_memory how can this be potentially unclear to you? Yes oom killer
> path might use heuristics to reduce the impact of the OOM condition but
> once we are in this path _we_are_OOM_.
> 

Hmm, not exactly.  You can't make the same argument for GFP_ATOMIC 
allocations, for instance, where we don't have the ability to reclaim.  
They get access to a memory reserve so they may succeed in this context.  
In the case your patch is short-circuiting, a GFP_KERNEL allocation can 
fail to reclaim and then you've decided to panic rather than give an 
exiting thread access to memory reserves.  It's unnecessary.

(I personally don't care what you do or do not label "oom", I only care 
about panic vs. no-panic when the kernel has the ability to allow the 
allocation to succeed and make forward progress.)

Let me be clear: the issue that Tetsuo brings up is very real and serious.  
It exists for system memory as well as memcg.  Trying to address it with 
panic_on_oom is absurd.  It may be difficult to address, and require 
substantial VM work to fix, but panicking is not a solution and would lead 
to arbitrary machines in a very large fleet rebooting.  There's nothing 
the userspace programmer could have done differently to prevent it, this 
is solely a kernel issue.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>