Re: why do we do ALLOC_WMARK_HIGH before going out_of_memory

Andrea Arcangeli <aarcange@xxxxxxxxxx> · Thu, 28 Jan 2016 20:02:04 +0100

Hello Michal,

On Thu, Jan 28, 2016 at 05:38:03PM +0100, Michal Hocko wrote:
> Hi,
> __alloc_pages_may_oom just after it manages to get oom_lock we try
> to allocate once more with ALLOC_WMARK_HIGH target. I was always
> wondering why are we will to actually kill something even though
> we are above min wmark. This doesn't make much sense to me. I understand
> that this is racy because __alloc_pages_may_oom is called after we have
> failed to fulfill the WMARK_MIN target but this means WMARK_HIGH
> is highly unlikely as well. So either we should use ALLOC_WMARK_MIN
> or get rid of this altogether.
> 
> The code has been added before git era by
> https://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc2/2.6.11-rc2-mm2/broken-out/mm-fix-several-oom-killer-bugs.patch

I assume you refer to this:

+		/*
+		 * Go through the zonelist yet one more time, keep
+		 * very high watermark here, this is only to catch
+		 * a parallel oom killing, we must fail if we're still
+		 * under heavy pressure.
+		 */
+		for (i = 0; (z = zones[i]) != NULL; i++) {
+			if (!zone_watermark_ok(z, order, z->pages_high,
			   			  	 ^^^^^^^^^^^^^

> and it doesn't explain this particular decision. It seems to me that

Not explained explicitly in the commit header but see the above
comment added just before the z->pages_high, it at least tries to
explain it..

Although the implementation changed and now it's ALLOC_WMARK_HIGH
instead of z->pages_high, the old comment is still in the current
upstream:

	/*
	 * Go through the zonelist yet one more time, keep very high watermark
	 * here, this is only to catch a parallel oom killing, we must fail if
	 * we're still under heavy pressure.
	 */

> what ever was the reason back then it doesn't hold anymore.
> 
> What do you think?

Elaborating the comment: the reason for the high wmark is to reduce
the likelihood of livelocks and be sure to invoke the OOM killer, if
we're still under pressure and reclaim just failed. The high wmark is
used to be sure the failure of reclaim isn't going to be ignored. If
using the min wmark like you propose there's risk of livelock or
anyway of delayed OOM killer invocation.

The reason for doing one last wmark check (regardless of the wmark
used) before invoking the oom killer, was just to be sure another OOM
killer invocation hasn't already freed a ton of memory while we were
stuck in reclaim. A lot of free memory generated by the OOM killer,
won't make a parallel reclaim more likely to succeed, it just creates
free memory, but reclaim only succeeds when it finds "freeable" memory
and it makes progress in converting it to free memory. So for the
purpose of this last check, the high wmark would work fine as lots of
free memory would have been generated in such case.

It's not immediately apparent if there is a new OOM killer upstream
logic that would prevent the risk of a second OOM killer invocation
despite another OOM killing already happened while we were stuck in
reclaim. In absence of that, the high wmark check would be still
needed.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>