Re: Still OOM problems with 4.9er/4.10er kernels

lkml@xxxxxxxxxxx · Thu, 16 Mar 2017 01:47:33 -0700

On Thu, Mar 16, 2017 at 09:27:14AM +0100, Michal Hocko wrote:
> On Thu 16-03-17 07:38:08, Gerhard Wiesinger wrote:
> [...]
> > The following commit is included in that version:
> > commit 710531320af876192d76b2c1f68190a1df941b02
> > Author: Michal Hocko <mhocko@xxxxxxxx>
> > Date:   Wed Feb 22 15:45:58 2017 -0800
> > 
> >     mm, vmscan: cleanup lru size claculations
> > 
> >     commit fd538803731e50367b7c59ce4ad3454426a3d671 upstream.
> 
> This patch shouldn't make any difference. It is a cleanup patch.
> I guess you meant 71ab6cfe88dc ("mm, vmscan: consider eligible zones in
> get_scan_count") but even that one shouldn't make any difference for 64b
> systems.
> 
> > But still OOMs:
> > [157048.030760] clamscan: page allocation stalls for 19405ms, order:0, mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null)
> 
> This is not OOM it is an allocation stall. The allocation request cannot
> simply make forward progress for more than 10s. This alone is bad but
> considering this is GFP_HIGHUSER_MOVABLE which has the full reclaim
> capabilities I would suspect your workload overcommits the available
> memory too much. You only have ~380MB of RAM with ~160MB sitting in the
> anonymous memory, almost nothing in the page cache so I am not wondering
> that you see a constant swap activity. There seems to be only 40M in the
> slab so we are still missing ~180MB which is neither on the LRU lists
> nor allocated by slab. This means that some kernel subsystem allocates
> from the page allocator directly.
> 
> That being said, I believe that what you are seeing is not a bug in the
> MM subsystem but rather some susbsytem using more memory than it used to
> before so your workload doesn't fit into the amount of memory you have
> anymore.
> 

While on the topic of understanding allocation stalls, Philip Freeman recently
mailed linux-kernel with a similar report, and in his case there are plenty of
page cache pages.  It was also a GFP_HIGHUSER_MOVABLE 0-order allocation.

I'm no MM expert, but it appears a bit broken for such a low-order allocation
to stall on the order of 10 seconds when there's plenty of reclaimable pages,
in addition to mostly unused and abundant swap space on SSD.

Regards,
Vito Caputo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>