On Thu 04-02-16 14:39:05, Michal Hocko wrote: > On Thu 04-02-16 22:10:54, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > I am not sure we can fix these pathological loads where we hit the > > > higher order depletion and there is a chance that one of the thousands > > > tasks terminates in an unpredictable way which happens to race with the > > > OOM killer. > > > > When I hit this problem on Dec 24th, I didn't run thousands of tasks. > > I think there were less than one hundred tasks in the system and only > > a few tasks were running. Not a pathological load at all. > > But as the OOM report clearly stated there were no > order-1 pages > available in that particular case. And that happened after the direct > reclaim and compaction were already invoked. > > As I've mentioned in the referenced email, we can try to do multiple > retries e.g. do not give up on the higher order requests until we hit > the maximum number of retries but I consider it quite ugly to be honest. > I think that a proper communication with compaction is a more > appropriate way to go long term. E.g. I find it interesting that > try_to_compact_pages doesn't even care about PAGE_ALLOC_COSTLY_ORDER > and treat is as any other high order request. > > Something like the following: With the patch description. Please note I haven't tested this yet so this is more a RFC than something I am really convinced about. I can live with it because the number of retries is nicely bounded but it sounds too hackish because it makes the decision rather blindly. I will talk to Vlastimil and Mel whether they see some way how to communicate the compaction state in a reasonable way. But I guess this is something that can come up later. What do you think? --- >From d09de26cee148b4d8c486943b4e8f3bd7ad6f4be Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@xxxxxxxx> Date: Thu, 4 Feb 2016 14:56:59 +0100 Subject: [PATCH] mm, oom: protect !costly allocations some more should_reclaim_retry will give up retries for higher order allocations if none of the eligible zones has any requested or higher order pages available even if we pass the watermak check for order-0. This is done because there is no guarantee that the reclaimable and currently free pages will form the required order. This can, however, lead to situations were the high-order request (e.g. order-2 required for the stack allocation during fork) will trigger OOM too early - e.g. after the first reclaim/compaction round. Such a system would have to be highly fragmented and the OOM killer is just a matter of time but let's stick to our MAX_RECLAIM_RETRIES for the high order and not costly requests to make sure we do not fail prematurely. This also means that we do not reset no_progress_loops at the __alloc_pages_slowpath for high order allocations to guarantee a bounded number of retries. Longterm it would be much better to communicate with the compaction and retry only if the compaction considers it meaningfull. Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> --- mm/page_alloc.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 269a04f20927..f05aca36469b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3106,6 +3106,18 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, } } + /* + * OK, so the watermak check has failed. Make sure we do all the + * retries for !costly high order requests and hope that multiple + * runs of compaction will generate some high order ones for us. + * + * XXX: ideally we should teach the compaction to try _really_ hard + * if we are in the retry path - something like priority 0 for the + * reclaim + */ + if (order && order <= PAGE_ALLOC_COSTLY_ORDER) + return true; + return false; } @@ -3281,11 +3293,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto noretry; /* - * Costly allocations might have made a progress but this doesn't mean - * their order will become available due to high fragmentation so do - * not reset the no progress counter for them + * High order allocations might have made a progress but this doesn't + * mean their order will become available due to high fragmentation so + * do not reset the no progress counter for them */ - if (did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER) + if (did_some_progress && !order) no_progress_loops = 0; else no_progress_loops++; -- 2.7.0 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>