Re: [patch] mm, oom: stop reclaiming if GFP_ATOMIC will start failing soon

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Mon, 27 Apr 2020 13:30:51 -0700

On Sun, 26 Apr 2020 20:12:58 -0700 (PDT) David Rientjes <rientjes@xxxxxxxxxx> wrote:

> > > blockable allocations and then queue a worker to asynchronously oom kill
> > > if it finds watermarks to be sufficiently low as well.
> > > 
> > 
> > Well, what's really going on here?
> > 
> > Is networking potentially consuming an unbounded amount of memory?  If
> > so, then killing a process will just cause networking to consume more
> > memory then hit against the same thing.  So presumably the answer is
> > "no, the watermarks are inappropriately set for this workload".
> > 
> > So would it not be sensible to dynamically adjust the watermarks in
> > response to this condition?  Maintain a larger pool of memory for these
> > allocations?  Or possibly push back on networking and tell it to reduce
> > its queue sizes?  So that stuff doesn't keep on getting oom-killed?
> > 
> 
> No - that would actually make the problem worse.
> 
> Today, per-zone min watermarks dictate when user allocations will loop or 
> oom kill.  should_reclaim_retry() currently loops if reclaim has succeeded 
> in the past few tries and we should be able to allocate if we are able to 
> reclaim the amount of memory that we think we can.
> 
> The issue is that this supposes that looping to reclaim more will result 
> in more free memory.  That doesn't always happen if there are concurrent 
> memory allocators.
> 
> GFP_ATOMIC allocators can access below these per-zone watermarks.  So the 
> issue is that per-zone free pages stays between ALLOC_HIGH watermarks 
> (the watermark that GFP_ATOMIC allocators can allocate to) and min 
> watermarks.  We never reclaim enough memory to get back to min watermarks 
> because reclaim cannot keep up with the amount of GFP_ATOMIC allocations.

But there should be an upper bound upon the total amount of in-flight
GFP_ATOMIC memory at any point in time?  These aren't like pagecache
which will take more if we give it more.  Setting the various
thresholds appropriately should ensure that blockable allocations don't
get their memory stolen by GPP_ATOMIC allocations?

I took a look at doing a quick-fix for the
direct-reclaimers-get-their-stuff-stolen issue about a million years
ago.  I don't recall where it ended up.  It's pretty trivial for the
direct reclaimer to free pages into current->reclaimed_pages and to
take a look in there on the allocation path, etc.  But it's only
practical for order-0 pages.