Michal Hocko wrote: > On Mon 26-02-18 19:58:19, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > On Sat 24-02-18 17:00:51, Tetsuo Handa wrote: > > > > >From d922dd170c2bed01a775e8cca0871098aecc253d Mon Sep 17 00:00:00 2001 > > > > From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > > > > Date: Sat, 24 Feb 2018 16:49:21 +0900 > > > > Subject: [PATCH v2] mm,page_alloc: wait for oom_lock than back off > > > > > > > > This patch fixes a bug which is essentially same with a bug fixed by > > > > commit 400e22499dd92613 ("mm: don't warn about allocations which stall for > > > > too long"). > > > > > > > > Currently __alloc_pages_may_oom() is using mutex_trylock(&oom_lock) based > > > > on an assumption that the owner of oom_lock is making progress for us. But > > > > it is possible to trigger OOM lockup when many threads concurrently called > > > > __alloc_pages_slowpath() because all CPU resources are wasted for pointless > > > > direct reclaim efforts. That is, schedule_timeout_uninterruptible(1) in > > > > __alloc_pages_may_oom() does not always give enough CPU resource to the > > > > owner of the oom_lock. > > > > > > > > It is possible that the owner of oom_lock is preempted by other threads. > > > > Preemption makes the OOM situation much worse. But the page allocator is > > > > not responsible about wasting CPU resource for something other than memory > > > > allocation request. Wasting CPU resource for memory allocation request > > > > without allowing the owner of oom_lock to make forward progress is a page > > > > allocator's bug. > > > > > > > > Therefore, this patch changes to wait for oom_lock in order to guarantee > > > > that no thread waiting for the owner of oom_lock to make forward progress > > > > will not consume CPU resources for pointless direct reclaim efforts. > > > > > > > > We know printk() from OOM situation where a lot of threads are doing almost > > > > busy-looping is a nightmare. As a side effect of this patch, printk() with > > > > oom_lock held can start utilizing CPU resources saved by this patch (and > > > > reduce preemption during printk(), making printk() complete faster). > > > > > > > > By changing !mutex_trylock(&oom_lock) with mutex_lock_killable(&oom_lock), > > > > it is possible that many threads prevent the OOM reaper from making forward > > > > progress. Thus, this patch removes mutex_lock(&oom_lock) from the OOM > > > > reaper. > > > > > > > > Also, since nobody uses oom_lock serialization when setting MMF_OOM_SKIP > > > > and we don't try last second allocation attempt after confirming that there > > > > is no !MMF_OOM_SKIP OOM victim, the possibility of needlessly selecting > > > > more OOM victims will be increased if we continue using ALLOC_WMARK_HIGH. > > > > Thus, this patch changes to use ALLOC_MARK_MIN. > > > > > > > > Also, since we don't want to sleep with oom_lock held so that we can allow > > > > threads waiting at mutex_lock_killable(&oom_lock) to try last second > > > > allocation attempt (because the OOM reaper starts reclaiming memory without > > > > waiting for oom_lock) and start selecting next OOM victim if necessary, > > > > this patch changes the location of the short sleep from inside of oom_lock > > > > to outside of oom_lock. > > > > > > This patch does three different things mangled into one patch. All that > > > with a patch description which talks a lot but doesn't really explain > > > those changes. > > > > > > Moreover, you are effectively tunning for an overloaded page allocator > > > artifical test case and add a central lock where many tasks would > > > block. I have already tried to explain that this is not an universal > > > win and you should better have a real life example where this is really > > > helpful. > > > > > > While I do agree that removing the oom_lock from __oom_reap_task_mm is a > > > sensible thing, changing the last allocation attempt to ALLOC_WMARK_MIN > > > is not all that straightforward and it would require much more detailed > > > explaination. > > > > > > So the patch in its current form is not mergeable IMHO. > > > > Your comment is impossible to satisfy. > > Please show me your version, for you are keeping me deadlocked. > > > > I'm angry with MM people's attitude that MM people are not friendly to > > users who are bothered by lockup / slowdown problems under memory pressure. > > They just say "Your system is overloaded" and don't provide enough support > > for checking whether they are hitting a real bug other than overloaded. > > You should listen much more and also try to understand concerns that we > have. You are trying to touch a very subtle piece of code. That code > is not perfect at all but it tends to work reasonably well under most > workloads out there. Now you try to handle corner cases you are seeing > and that is good thing in general. But the fix shouldn't introduce new > risks and adding a single synchronization point into the oom path is > simply not without its own risks. Then, please show me a real life example (by comparing trylock() versus lock_killable()) where lock_killable() hurts. If you proved it, I'm willing to make this optional (i.e. "use trylock() at the risk of lockup or use lock_killable() at the risk of latency"). Without testing this patch at real world, we won't be able to prove it though. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>