Re: DOM Worker: page allocation stalls (4.9.13)

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Fri, 17 Mar 2017 22:24:40 +0900

On 2017/03/17 17:46, Michal Hocko wrote:
> On Thu 16-03-17 03:04:09, Philip J. Freeman wrote:
>> My laptop became almost totally un responsive today. I was able to
>> switch VTs but not log in and had to power cycle to regain control. I
>> don't understand what this means. Any ideas?
>>
>> Mar 14 14:31:20 x61s-44a5 kernel: [168382.032039] DOM Worker: page allocation stalls for 10646ms, order:0, mode:0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO)
> [...]
>> Mar 14 14:31:22 x61s-44a5 kernel: [168382.032181] Mem-Info:
>> Mar 14 14:31:22 x61s-44a5 kernel: [168382.032192] active_anon:308454 inactive_anon:154809 isolated_anon:224
>> Mar 14 14:31:22 x61s-44a5 kernel: [168382.032192]  active_file:869 inactive_file:978 isolated_file:0
>> Mar 14 14:31:22 x61s-44a5 kernel: [168382.032192]  unevictable:0 dirty:0 writeback:0 unstable:0
>> Mar 14 14:31:22 x61s-44a5 kernel: [168382.032192]  slab_reclaimable:6099 slab_unreclaimable:8555
>> Mar 14 14:31:22 x61s-44a5 kernel: [168382.032192]  mapped:1999 shmem:156254 pagetables:2929 bounce:0
>> Mar 14 14:31:22 x61s-44a5 kernel: [168382.032192]  free:13192 free_pcp:0 free_cma:0
> 
> OK, so the allocation couldn't make a forward progress for more than
> 10s. You do not seem to have many file pages on the LRU lists left
> and so you only have anonymous memory as reclaimable. Slab doesn't
> have many pages either. Everything together makes it 1886MB out of 2GB.
> ~50MB is free so this means ~70MB is in unaccounted memory (50MB is
> reserved) which looks reasonably and I wouldn't suspect any kernel
> memory leak

I don't suspect any kernel memory leak here.

> And again the anonymous memory pressure grows. So I would suspect some
> userspace application went off the hook and started consuming a lot of
> anonymous memory which gets you to a trashing stage when basically
> nothing can move on much without swap out. The page cache is at its
> minimum and I suspect that most binaries would have to be read from disk
> and you reached the point of trashing. I am afraid we are not really
> great at handling these situations from the kernel well. Killing the
> memory hog would be probably the most sane thing to do.
> 

I don't know what "DOM Worker" process is. But guessing from that there is
"firefox-esr" process, "DOM Worker" is a process related to HTML5 Web Workers API.
Since web browser processes can heavily consume memory depending on the content
loaded (or memory leak of plugins), it is possible that you are overstressing
the system.

"DMA32 free:" is below "DMA32 min:" which I think means that the OOM killer
would have been triggerred immediately if there is no swap.

I guess there were other processes which stalled less than 10 seconds. Maybe
processes stalling at doing swap I/O exist, but we can't know them because
warn_alloc() threshold is not configurable and __GFP_NOWARN allocations are
not reported by warn_alloc(). Too bad.

If you can rebuild your kernel, calling dump_tasks() in mm/oom_kill.c when
you hit warn_alloc() warnings might help. If you cannot, SysRq-f will be
handy to check.

If you still suspect that this is a kernel problem, you can try older kernels
for comparison purpose which you think everything was working well, though
warn_alloc() was added in Linux 4.9.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>