Hello Michal, On Thu, Jan 28, 2016 at 05:38:03PM +0100, Michal Hocko wrote: > Hi, > __alloc_pages_may_oom just after it manages to get oom_lock we try > to allocate once more with ALLOC_WMARK_HIGH target. I was always > wondering why are we will to actually kill something even though > we are above min wmark. This doesn't make much sense to me. I understand > that this is racy because __alloc_pages_may_oom is called after we have > failed to fulfill the WMARK_MIN target but this means WMARK_HIGH > is highly unlikely as well. So either we should use ALLOC_WMARK_MIN > or get rid of this altogether. > > The code has been added before git era by > https://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc2/2.6.11-rc2-mm2/broken-out/mm-fix-several-oom-killer-bugs.patch I assume you refer to this: + /* + * Go through the zonelist yet one more time, keep + * very high watermark here, this is only to catch + * a parallel oom killing, we must fail if we're still + * under heavy pressure. + */ + for (i = 0; (z = zones[i]) != NULL; i++) { + if (!zone_watermark_ok(z, order, z->pages_high, ^^^^^^^^^^^^^ > and it doesn't explain this particular decision. It seems to me that Not explained explicitly in the commit header but see the above comment added just before the z->pages_high, it at least tries to explain it.. Although the implementation changed and now it's ALLOC_WMARK_HIGH instead of z->pages_high, the old comment is still in the current upstream: /* * Go through the zonelist yet one more time, keep very high watermark * here, this is only to catch a parallel oom killing, we must fail if * we're still under heavy pressure. */ > what ever was the reason back then it doesn't hold anymore. > > What do you think? Elaborating the comment: the reason for the high wmark is to reduce the likelihood of livelocks and be sure to invoke the OOM killer, if we're still under pressure and reclaim just failed. The high wmark is used to be sure the failure of reclaim isn't going to be ignored. If using the min wmark like you propose there's risk of livelock or anyway of delayed OOM killer invocation. The reason for doing one last wmark check (regardless of the wmark used) before invoking the oom killer, was just to be sure another OOM killer invocation hasn't already freed a ton of memory while we were stuck in reclaim. A lot of free memory generated by the OOM killer, won't make a parallel reclaim more likely to succeed, it just creates free memory, but reclaim only succeeds when it finds "freeable" memory and it makes progress in converting it to free memory. So for the purpose of this last check, the high wmark would work fine as lots of free memory would have been generated in such case. It's not immediately apparent if there is a new OOM killer upstream logic that would prevent the risk of a second OOM killer invocation despite another OOM killing already happened while we were stuck in reclaim. In absence of that, the high wmark check would be still needed. Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>