Re: [PATCH v2] mm, page_alloc: Remove debug_guardpage_minorder() test in warn_alloc().

David Rientjes <rientjes@xxxxxxxxxx> · Tue, 18 Apr 2017 16:10:35 -0700 (PDT)

On Tue, 18 Apr 2017, Tetsuo Handa wrote:

> Commit c0a32fc5a2e470d0 ("mm: more intensive memory corruption debugging")
> changed to check debug_guardpage_minorder() > 0 when reporting allocation
> failures. The reasoning was
> 
>   When we use guard page to debug memory corruption, it shrinks available
>   pages to 1/2, 1/4, 1/8 and so on, depending on parameter value.
>   In such case memory allocation failures can be common and printing
>   errors can flood dmesg. If somebody debug corruption, allocation
>   failures are not the things he/she is interested about.
> 
> but is misguided.
> 

As discussed privately, I think the reasoning is worthwhile.  
debug_guardpage_minorder is effectively pulling DIMMs from your system 
from the perspective of the buddy allocator, which triggers these 
warnings.  Nobody will be deploying with this config on production 
systems, they are interesting in debugging or triaging issues.  As a 
result, I agree that low on memory or fragmentation issues as a result of 
the overhead required for this debugging is not interesting to report.  

> Allocation requests with __GFP_NOWARN flag by definition do not cause
> flooding of allocation failure messages. Allocation requests with
> __GFP_NORETRY flag likely also have __GFP_NOWARN flag. Costly allocation
> requests likely also have __GFP_NOWARN flag.
> 
> Allocation requests without __GFP_DIRECT_RECLAIM flag likely also have
> __GFP_NOWARN flag or __GFP_HIGH flag. Non-costly allocation requests with
> __GFP_DIRECT_RECLAIM flag basically retry forever due to the "too small to
> fail" memory-allocation rule.
> 
> Therefore, as a whole, shrinking available pages by
> debug_guardpage_minorder= kernel boot parameter might cause flooding of
> OOM killer messages but unlikely causes flooding of allocation failure
> messages. Let's remove debug_guardpage_minorder() > 0 check which would
> likely be pointless.
> 

Hmm, not necessarily, the oom killer can be used in situations where the 
context allows it but there is still a great possibility that we are 
getting page allocation failure warnings in softirq context, including 
high-order allocations from the networking layer.  I think the reasoning 
presented by Stanislaw in commit c0a32fc5a2e4 is correct and we ought to 
avoid allocation failure warnings spamming the log when looking for real 
debugging information.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>