On Thu 02-11-17 22:15:06, Tetsuo Handa wrote: > I was waiting for Michal's comment, but it seems that he is too busy now. > Thus, I post non-authoritative comment here. (I'm not a tracepoints user.) yes, that is the case. Thanks for looking into this Tetsuo. > Two stalls were found in dmesg but only PID = 2798 part was recorded in the trace logs. > > [ 6109.502115] chrome: page allocation stalls for 10321ms, order:0, mode:0x14000d2(GFP_TEMPORARY|__GFP_HIGHMEM), nodemask=(null) > [ 6109.502179] chrome cpuset=/ mems_allowed=0 > [ 6109.502570] CPU: 0 PID: 2798 Comm: chrome Not tainted 4.13.9-300.fc27.x86_64+debug #1 > I have only glanced through the trace data. > So, trying to analyze this one. > > Since 10 seconds of blank was found between mm_shrink_slab_start and > mm_shrink_slab_end, this alone can cause stall warning messages. > > # tracer: nop > # > # _-----=> irqs-off > # / _----=> need-resched > # | / _---=> hardirq/softirq > # || / _--=> preempt-depth > # ||| / delay > # TASK-PID CPU# |||| TIMESTAMP FUNCTION > # | | | |||| | | > chrome-2798 [000] .N.1 6099.188540: mm_shrink_slab_start: super_cache_scan+0x0/0x1b0 ffff8eefa4651830: nid: 0 objects to shrink 5895 gfp_flags GFP_TEMPORARY|__GFP_HIGHMEM pgs_scanned 90 lru_pgs 6992959 cache items 5049 delta 0 total_scan 2524 > chrome-2798 [000] ...1 6109.494205: mm_shrink_slab_end: super_cache_scan+0x0/0x1b0 ffff8eefa4651830: nid: 0 unused scan count 5895 new scan count 941785 total_scan 476 last shrinker return val 1959 Yeah, the direct reclaim has started chrome-2798 [000] ...1 6099.187991: mm_vmscan_direct_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_TEMPORARY|__GFP_HIGHMEM classzone_idx=2 and finished chrome-2798 [000] ...1 6109.509445: mm_vmscan_direct_reclaim_end: nr_reclaimed=51 the only notable hole in logging was the one pointed by Tetsuo. There is a lot of activity on that CPU during that time wrt. reclaim $ grep -v '\-2798' trace.txt | grep '\[000\]' | awk '{val=$4+0; if (val > 6099 && val < 6109) print}' | wc -l 744 And there were more processes involved $ grep -v '\-2798' trace.txt | grep '\[000\]' | awk '{val=$4+0; if (val > 6099 && val < 6109) print $1}' | sort | uniq -c 74 <...>-10654 43 <...>-13862 82 <...>-17624 2 <...>-27318 1 <...>-3518 37 <...>-5331 180 <...>-6602 38 chrome-3482 40 Chrome_IOThread-2773 3 DedicatedWorker-19604 1 gmain-10668 139 qemu-system-x86-13763 104 TaskSchedulerBa-6011 So I agree that it looks like your system seems to be overloaded. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>