Re: [PATCH] mm,page_alloc: Split stall warning and failure warning.

Michal Hocko <mhocko@xxxxxxxxxx> · Mon, 10 Apr 2017 14:39:34 +0200

On Mon 10-04-17 20:58:13, Tetsuo Handa wrote:
> Patch "mm: page_alloc: __GFP_NOWARN shouldn't suppress stall warnings"
> changed to drop __GFP_NOWARN when calling warn_alloc() for stall warning.
> Although I suggested for two times to drop __GFP_NOWARN when warn_alloc()
> for stall warning was proposed, Michal Hocko does not want to print stall
> warnings when __GFP_NOWARN is given [1][2].
> 
>  "I am not going to allow defining a weird __GFP_NOWARN semantic which
>   allows warnings but only sometimes. At least not without having a proper
>   way to silence both failures _and_ stalls or just stalls. I do not
>   really thing this is worth the additional gfp flag."
> 
> I don't know whether he is aware of "mm: page_alloc: __GFP_NOWARN
> shouldn't suppress stall warnings" patch, but I assume that
> no response means he finally accepted this change.

I am certainly not happy about it but I just do not have time to
endlessly discuss this absolutely minor thing. I have raised my worries
already.

> Therefore,
> this patch splits into a function for reporting allocation stalls
> and a function for reporting allocation failures, due to below reasons.
> 
>   (1) Dropping __GFP_NOWARN when calling warn_alloc() causes
>       "mode:%#x(%pGg)" to report incorrect flags. It can confuse
>       developers when scanning the source code for corresponding
>       location.

You have the backtrace which make it clear _what_ is the allocation
context.

>   (2) Not reporting when debug_guardpage_minorder() > 0 causes failing
>       to report stall warnings. Stall warnings should not be be disabled
>       by debug_guardpage_minorder() > 0 as well as __GFP_NOWARN.

Could you remind me why this matter at all? Who is the user and why does
it matter?

>   (3) Sharing warn_alloc() for reporting stalls (which is guaranteed
>       to be schedulable context) and for reporting failures (which is
>       not guaranteed to be schedulable context) is inconvenient when
>       adding a mutex for serializing printk() messages and/or filtering
>       events which should be handled for further analysis based on
>       function name.
> 
>       # stap -F -g -e 'probe kernel.function("warn_alloc").return {
>                        if (determine_whether_reason_is_allocation_stall)
>                            panic("MemAlloc stall detected."); }'
> 
>       # stap -F -g -e 'probe kernel.function("warn_alloc_stall").return {
>                        panic("MemAlloc stall detected."); }'

This is not a sufficient reason to add more code.
> 
>       Although adding allocation watchdog [3] will do it more powerfully,
>       allocation watchdog discussion is still stalling. Thus, for now
>       I propose triggering from warn_alloc_stall().
> 
> [1] http://lkml.kernel.org/r/20160929091040.GE408@xxxxxxxxxxxxxx
> [2] http://lkml.kernel.org/r/20170114090613.GD9962@xxxxxxxxxxxxxx
> [3] http://lkml.kernel.org/r/1489578541-81526-1-git-send-email-penguin-kernel@xxxxxxxxxxxxxxxxxxx
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxxxx>

NAK. This just adds a pointless code and it doesn't solve any real
issue.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>