Re: [PATCH 1/5] mm,page_alloc: Update comment for last second allocation attempt.

Michal Hocko <mhocko@xxxxxxxx> · Thu, 9 Nov 2017 12:30:40 +0100

On Thu 09-11-17 19:45:04, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Wed 08-11-17 20:01:44, Tetsuo Handa wrote:
> > > __alloc_pages_may_oom() is doing last second allocation attempt using
> > > ALLOC_WMARK_HIGH before calling out_of_memory(). This had two reasons.
> > > 
> > > The first reason is explained in the comment that it aims to catch
> > > potential parallel OOM killing. But there is no longer parallel OOM
> > > killing (in the sense that out_of_memory() is called "concurrently")
> > > because we serialize out_of_memory() calls using oom_lock.
> > > 
> > > The second reason is explained by Andrea Arcangeli (who added that code)
> > > that it aims to reduce the likelihood of OOM livelocks and be sure to
> > > invoke the OOM killer. There was a risk of livelock or anyway of delayed
> > > OOM killer invocation if ALLOC_WMARK_MIN is used, for relying on last
> > > few pages which are constantly allocated and freed in the meantime will
> > > not improve the situation.
> 
> Above part is OK, isn't it?
> 
> > 
> > > But there is no longer possibility of OOM
> > > livelocks or failing to invoke the OOM killer because we need to mask
> > > __GFP_DIRECT_RECLAIM for last second allocation attempt because oom_lock
> > > prevents __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocations which last
> > > second allocation attempt indirectly involve from failing.
> > 
> > This is an unfounded, misleading and actually even wrong statement that
> > has nothing to do with what Andrea had in mind. __GFP_DIRECT_RECLAIM
> > doesn't have anything to do with the livelock as I've already mentioned
> > several times already.
> 
> I know that this part is not what Andrea had in mind when he added this comment.
> What I'm saying is that "precondition has changed after Andrea added this comment"
> and "these reasons which Andrea had in mind when he added this comment no longer
> holds". I'm posting "for the record" purpose in order to describe reasons for
> current code.
> 
> When we introduced oom_lock (or formerly the per-zone oom lock) for serializing invocation
> of the OOM killer, we introduced two bugs at the same time. One bug is that since doing
> __GFP_DIRECT_RECLAIM with oom_lock held can make __GFP_DIRECT_RECLAIM && !__GFP_NORETRY
> allocations (which __GFP_DIRECT_RECLAIM indirectly involved) lockup, we need to avoid
> __GFP_DIRECT_RECLAIM allocations with oom_lock held. This is why commit e746bf730a76fe53
> ("mm,page_alloc: don't call __node_reclaim() with oom_lock held.") was made. This in turn
> forbids using __GFP_DIRECT_RECLAIM for last second allocation attempt which was not
> forbidden when Andrea added this comment.

But this has anything to do with the original motivation for the high
watermark allocation.

> ( The other bug is that we assumed that somebody is making progress for us when
> mutex_trylock(&oom_lock) in __alloc_pages_may_oom() failed, for we did not take
> scheduling priority into account when we introduced oom_lock. But the other bug
> is not what I'm writing in this patch. You can forget about the other bug
> regarding this patch. )
> 
> > 
> > > Since the OOM killer does not always kill a process consuming significant
> > > amount of memory (the OOM killer kills a process with highest OOM score
> > > (or instead one of its children if any)), there will be cases where
> > > ALLOC_WMARK_HIGH fails and ALLOC_WMARK_MIN succeeds.
> > 
> > This is possible but not really interesting case as already explained.
> > 
> > > Since the gap between ALLOC_WMARK_HIGH and ALLOC_WMARK_MIN can be changed
> > > by /proc/sys/vm/min_free_kbytes parameter, using ALLOC_WMARK_MIN for last
> > > second allocation attempt might be better for minimizing number of OOM
> > > victims. But that change should be done in a separate patch. This patch
> > > just clarifies that ALLOC_WMARK_HIGH is an arbitrary choice.
> > 
> > Again unfounded claim.
> 
> Since use of __GFP_DIRECT_RECLAIM for last second allocation attempt is now
> forbidden due to oom_lock already held, possibility of failing last allocation
> attempt has increased compared to when Andrea added this comment. Andrea said
> 
>   The high wmark is used to be sure the failure of reclaim isn't going to be
>   ignored. If using the min wmark like you propose there's risk of livelock or
>   anyway of delayed OOM killer invocation.

Wrong. It just takes an unrelated single page alloc/free loop to prevent
from the oom killer invocation.

[...]
> So, I believe that the changelog is not wrong, and I don't want to preserve
> 
>   keep very high watermark here, this is only to catch a parallel oom killing,
>   we must fail if we're still under heavy pressure
> 
> part which lost strong background.

I do not see how. You simply do not address the original concern Andrea
had and keep repeating unrelated stuff.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>