On Thu 09-11-17 19:45:04, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Wed 08-11-17 20:01:44, Tetsuo Handa wrote: > > > __alloc_pages_may_oom() is doing last second allocation attempt using > > > ALLOC_WMARK_HIGH before calling out_of_memory(). This had two reasons. > > > > > > The first reason is explained in the comment that it aims to catch > > > potential parallel OOM killing. But there is no longer parallel OOM > > > killing (in the sense that out_of_memory() is called "concurrently") > > > because we serialize out_of_memory() calls using oom_lock. > > > > > > The second reason is explained by Andrea Arcangeli (who added that code) > > > that it aims to reduce the likelihood of OOM livelocks and be sure to > > > invoke the OOM killer. There was a risk of livelock or anyway of delayed > > > OOM killer invocation if ALLOC_WMARK_MIN is used, for relying on last > > > few pages which are constantly allocated and freed in the meantime will > > > not improve the situation. > > Above part is OK, isn't it? > > > > > > But there is no longer possibility of OOM > > > livelocks or failing to invoke the OOM killer because we need to mask > > > __GFP_DIRECT_RECLAIM for last second allocation attempt because oom_lock > > > prevents __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocations which last > > > second allocation attempt indirectly involve from failing. > > > > This is an unfounded, misleading and actually even wrong statement that > > has nothing to do with what Andrea had in mind. __GFP_DIRECT_RECLAIM > > doesn't have anything to do with the livelock as I've already mentioned > > several times already. > > I know that this part is not what Andrea had in mind when he added this comment. > What I'm saying is that "precondition has changed after Andrea added this comment" > and "these reasons which Andrea had in mind when he added this comment no longer > holds". I'm posting "for the record" purpose in order to describe reasons for > current code. > > When we introduced oom_lock (or formerly the per-zone oom lock) for serializing invocation > of the OOM killer, we introduced two bugs at the same time. One bug is that since doing > __GFP_DIRECT_RECLAIM with oom_lock held can make __GFP_DIRECT_RECLAIM && !__GFP_NORETRY > allocations (which __GFP_DIRECT_RECLAIM indirectly involved) lockup, we need to avoid > __GFP_DIRECT_RECLAIM allocations with oom_lock held. This is why commit e746bf730a76fe53 > ("mm,page_alloc: don't call __node_reclaim() with oom_lock held.") was made. This in turn > forbids using __GFP_DIRECT_RECLAIM for last second allocation attempt which was not > forbidden when Andrea added this comment. But this has anything to do with the original motivation for the high watermark allocation. > ( The other bug is that we assumed that somebody is making progress for us when > mutex_trylock(&oom_lock) in __alloc_pages_may_oom() failed, for we did not take > scheduling priority into account when we introduced oom_lock. But the other bug > is not what I'm writing in this patch. You can forget about the other bug > regarding this patch. ) > > > > > > Since the OOM killer does not always kill a process consuming significant > > > amount of memory (the OOM killer kills a process with highest OOM score > > > (or instead one of its children if any)), there will be cases where > > > ALLOC_WMARK_HIGH fails and ALLOC_WMARK_MIN succeeds. > > > > This is possible but not really interesting case as already explained. > > > > > Since the gap between ALLOC_WMARK_HIGH and ALLOC_WMARK_MIN can be changed > > > by /proc/sys/vm/min_free_kbytes parameter, using ALLOC_WMARK_MIN for last > > > second allocation attempt might be better for minimizing number of OOM > > > victims. But that change should be done in a separate patch. This patch > > > just clarifies that ALLOC_WMARK_HIGH is an arbitrary choice. > > > > Again unfounded claim. > > Since use of __GFP_DIRECT_RECLAIM for last second allocation attempt is now > forbidden due to oom_lock already held, possibility of failing last allocation > attempt has increased compared to when Andrea added this comment. Andrea said > > The high wmark is used to be sure the failure of reclaim isn't going to be > ignored. If using the min wmark like you propose there's risk of livelock or > anyway of delayed OOM killer invocation. Wrong. It just takes an unrelated single page alloc/free loop to prevent from the oom killer invocation. [...] > So, I believe that the changelog is not wrong, and I don't want to preserve > > keep very high watermark here, this is only to catch a parallel oom killing, > we must fail if we're still under heavy pressure > > part which lost strong background. I do not see how. You simply do not address the original concern Andrea had and keep repeating unrelated stuff. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>