Michal, if you are busy, can I test my version until you get time? Tetsuo Handa wrote: > Michal Hocko wrote: > > On Tue 23-01-18 21:07:03, Tetsuo Handa wrote: > > > Michal Hocko wrote: > > > > To be completely host, I am not in love with this > > > > schedule_timeout_uninterruptible(1). It is an ugly hack. It used to be > > > > much more important in the past when the oom victim test was too > > > > fragile. I strongly suspect that it is not needed this days so rather > > > > than moving the sleep around I would try to remove it altogether. > > > > > > But this schedule_timeout_uninterruptible(1) serves as a guaranteed > > > sleep for PF_WQ_WORKER threads > > > ( http://lkml.kernel.org/r/20170830064019.mfihbeu3mm5ygcrb@xxxxxxxxxxxxxx ). > > > > > > > If we are under memory pressure, __zone_watermark_ok() can return false. > > > > If __zone_watermark_ok() == false, when is schedule_timeout_*() called explicitly? > > > > > > If all zones fail with the watermark check then we should hit the oom > > > path and sleep there. We do not do so for all cases though. > > > > > > Thus, you cannot simply remove it. > > > > Then I would rather make should_reclaim_retry more robust. > > I'm OK with that if we can remove schedule_timeout_*() with oom_lock held. > > > > > > > Also, your changelog silently skips over some important details. The > > > > system must be really overloaded when a short sleep can take minutes. > > > > > > Yes, the system was really overloaded, for I was testing below reproducer > > > on a x86_32 kernel. > > [...] > > > > I would trongly suspect that such an overloaded system doesn't need > > > > a short sleep to hold the oom lock for too long. All you need is to be > > > > preempted. So this patch doesn't really solve any _real_ problem. > > > > > > Preemption makes the OOM situation much worse. The only way to avoid all OOM > > > lockups caused by lack of CPU resource is to replace mutex_trylock(&oom_lock) > > > in __alloc_pages_may_oom() with mutex_lock(&oom_lock) (or similar) in order to > > > guarantee that all threads waiting for the OOM killer/reaper to make forward > > > progress shall give enough CPU resource. > > > > And how exactly does that help when the page allocator gets preempted by > > somebody not doing any allocation? > > The page allocator is not responsible for wasting CPU resource for something > other than memory allocation request. Wasting CPU resource due to unable to > allow the OOM killer/reaper to make forward progress is the page allocator's > bug. > > There are currently ways to artificially choke the OOM killer/reaper (e.g. let > a SCHED_IDLE priority thread which is allowed to run on only one specific CPU > invoke the OOM killer). To mitigate it, offloading the OOM killer to a dedicated > kernel thread (like the OOM reaper) which has reasonable scheduling priority and > is allowed to run on any idle CPU will help. But such enhancement is out of scope > for this patch. This patch is intended for avoid sleeping for minutes at > schedule_timeout_killable(1) with oom_lock held which can be reproduced without > using SCHED_IDLE priority and/or runnable CPU restrictions. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>