On Sat, 19 Sep 2015, Tetsuo Handa wrote: > I think that use of ALLOC_NO_WATERMARKS via TIF_MEMDIE is the underlying > cause. ALLOC_NO_WATERMARKS via TIF_MEMDIE is intended for terminating the > OOM victim task as soon as possible, but it turned out that it will not > work if there is invisible lock dependency. Therefore, why not to give up > "there should be only up to 1 TIF_MEMDIE task" rule? > I don't see the connection between TIF_MEMDIE and ALLOC_NO_WATERMARKS being problematic. It is simply the mechanism by which we give oom killed processes access to memory reserves if they need it. I believe you are referring only to the oom killer stalling when it finds an oom victim. > What this patch (and many others posted in various forms many times over > past years) does is to give up "there should be only up to 1 TIF_MEMDIE > task" rule. I think that we need to tolerate more than 1 TIF_MEMDIE tasks > and somehow manage in a way memory reserves will not deplete. > Your proposal, which I mostly agree with, tries to kill additional processes so that they allocate and drop the lock that the original victim depends on. My approach, from http://marc.info/?l=linux-kernel&m=144010444913702, is the same, but without the killing. It's unecessary to kill every process on the system that is depending on the same lock, and we can't know which processes are stalling on that lock and which are not. I think it's much easier to simply identify such a situation where a process has not exited in a timely manner and then provide processes access to memory reserves without being killed. We hope that the victim will have queued its mutex_lock() and allocators that are holding the lock will drop it after successfully utilizing memory reserves. We can mitigate immediate depletion of memory reserves by requiring all allocators to reclaim (or compact) and calling the oom killer to identify the timeout before granting access to memory reserves for a single allocation before schedule_timeout_killable(1) and returning. I don't know of any alternative solutions where we can guarantee that memory reserves cannot be depleted unless memory reserves are 100% of memory. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>