Re: swapper/0: page allocation failure: order:0, mode:0x1204010(GFP_NOWAIT|__GFP_COMP|__GFP_RECLAIMABLE|__GFP_NOTRACK), nodemask=(null)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 02-11-17 16:01:20, Michal Hocko wrote:
> On Thu 02-11-17 22:15:06, Tetsuo Handa wrote:
> > I was waiting for Michal's comment, but it seems that he is too busy now.
> > Thus, I post non-authoritative comment here. (I'm not a tracepoints user.)
> 
> yes, that is the case. Thanks for looking into this Tetsuo.
> 
> > Two stalls were found in dmesg but only PID = 2798 part was recorded in the trace logs.
> > 
> >   [ 6109.502115] chrome: page allocation stalls for 10321ms, order:0, mode:0x14000d2(GFP_TEMPORARY|__GFP_HIGHMEM), nodemask=(null)
> >   [ 6109.502179] chrome cpuset=/ mems_allowed=0
> >   [ 6109.502570] CPU: 0 PID: 2798 Comm: chrome Not tainted 4.13.9-300.fc27.x86_64+debug #1
> > 
> 
> I have only glanced through the trace data.
> 
> > So, trying to analyze this one. 
> > 
> > Since 10 seconds of blank was found between mm_shrink_slab_start and
> > mm_shrink_slab_end, this alone can cause stall warning messages.
> > 
> >   # tracer: nop
> >   #
> >   #                              _-----=> irqs-off
> >   #                             / _----=> need-resched
> >   #                            | / _---=> hardirq/softirq
> >   #                            || / _--=> preempt-depth
> >   #                            ||| /     delay
> >   #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> >   #              | |       |   ||||       |         |
> >             chrome-2798  [000] .N.1  6099.188540: mm_shrink_slab_start: super_cache_scan+0x0/0x1b0 ffff8eefa4651830: nid: 0 objects to shrink 5895 gfp_flags GFP_TEMPORARY|__GFP_HIGHMEM pgs_scanned 90 lru_pgs 6992959 cache items 5049 delta 0 total_scan 2524
> >             chrome-2798  [000] ...1  6109.494205: mm_shrink_slab_end: super_cache_scan+0x0/0x1b0 ffff8eefa4651830: nid: 0 unused scan count 5895 new scan count 941785 total_scan 476 last shrinker return val 1959
> 
> Yeah, the direct reclaim has started
> chrome-2798  [000] ...1  6099.187991: mm_vmscan_direct_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_TEMPORARY|__GFP_HIGHMEM classzone_idx=2
> and finished
> chrome-2798  [000] ...1  6109.509445: mm_vmscan_direct_reclaim_end: nr_reclaimed=51
> 
> the only notable hole in logging was the one pointed by Tetsuo. There is
> a lot of activity on that CPU during that time wrt. reclaim
> $ grep -v '\-2798' trace.txt | grep '\[000\]' | awk '{val=$4+0; if (val > 6099 && val < 6109) print}' | wc -l
> 744
> 
> And there were more processes involved
> $ grep -v '\-2798' trace.txt | grep '\[000\]' | awk '{val=$4+0; if (val > 6099 && val < 6109) print $1}' | sort | uniq -c
>      74 <...>-10654
>      43 <...>-13862
>      82 <...>-17624
>       2 <...>-27318
>       1 <...>-3518
>      37 <...>-5331
>     180 <...>-6602
>      38 chrome-3482
>      40 Chrome_IOThread-2773
>       3 DedicatedWorker-19604
>       1 gmain-10668
>     139 qemu-system-x86-13763
>     104 TaskSchedulerBa-6011
> 
> So I agree that it looks like your system seems to be overloaded.

That being said, though, it makes sense to double check whether we
cond_resched sufficiently because being stuck for 10s while a handful
number of processes is reclaiming sounds still too much. Maybe there is
more than just the reclaim going on. This would require a deeper
inspection of the trace data and maybe even gather other tracepoints.
I am sorry but I cannot promis I will find time to look at this anytime
soon.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux