Re: [patch] mm, vmscan: abort futile reclaim if we've been oom killed

Johannes Weiner <hannes@xxxxxxxxxxx> · Thu, 21 Nov 2013 11:40:19 -0500

On Wed, Nov 20, 2013 at 07:08:50PM -0800, David Rientjes wrote:
> My patch is not in a fastpath, it has extremely minimal overhead, and it 
> allows an oom killed victim to exit much quicker instead of incurring 
> O(seconds) stalls because of 700 other allocators grabbing the cpu in a 
> futile effort to reclaim memory themselves.
> 
> Andrew, this fixes a real-world issue that exists and I'm asking that it 
> be merged so that oom killed processes can quickly allocate and exit to 
> free its memory.  If a more invasive future patch causes it to no longer 
> be necessary, that's what we call kernel development.  Thanks.

All I'm trying to do is find the broader root cause for the problem
you are experiencing and find a solution that will leave us with
maintainable code.  It does not matter how few instructions your fix
adds, it changes the outcome of the algorithm and makes every
developer trying to grasp the complexity of page reclaim think about
yet another special condition.

The more specific the code is, the harder it will be to understand in
the future.  Yes, it's a one-liner, but we've had death by a thousand
cuts before, many times.  A few cycles ago, kswapd was blowing up left
and right simply because it was trying to meet too many specific
objectives from facilitating order-0 allocators, maintaining zone
health, enabling compaction for higher order allocation, writing back
dirty pages.  Ultimately, it just got stuck in endless loops because
of conflicting conditionals.  We've had similar problems in the scan
count calculation etc where all the checks and special cases left us
with code that was impossible to reason about.  There really is a
history of "low overhead one-liner fixes" eating us alive in the VM.

The solution was always to take a step back and integrate all
requirements properly.  Not only did this fix the problems, the code
ended up being much more robust and easier to understand and modify as
well.

If shortening the direct reclaim cycle is an adequate solution to your
problem, it would be much preferable.  Because

  "checking at a reasonable interval if the work I'm doing is still
   necessary"

is a much more approachable, generic, and intuitive concept than

  "the OOM killer has gone off, direct reclaim is futile, I should
   exit quickly to release memory so that not more tasks get caught
   doing direct reclaim".

and the fix would benefit a much wider audience.

Lastly, as far as I know, you are the only reporter that noticed an
issue with this loooooong-standing behavior, and you don't even run
upstream kernels.  There really is no excuse to put up with a quick &
dirty fix.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>