On Thu, 14 Jan 2016, Johannes Weiner wrote: > > This is where me and you disagree; the goal should not be to continue to > > oom kill more and more processes since there is no guarantee that further > > kills will result in forward progress. These additional kills can result > > in the same livelock that is already problematic, and killing additional > > processes has made the situation worse since memory reserves are more > > depleted. > > > > I believe what is better is to exhaust reclaim, check if the page > > allocator is constantly looping due to waiting for the same victim to > > exit, and then allowing that allocation with memory reserves, see the > > attached patch which I have proposed before. > > If giving the reserves to another OOM victim is bad, how is giving > them to the *allocating* task supposed to be better? Unfortunately, due to rss and oom priority, it is possible to repeatedly select processes which are all waiting for the same mutex. This is possible when loading shards, for example, and all processes have the same oom priority and are livelocked on i_mutex which is the most common occurrence in our environments. The livelock came about because we selected a process that could not make forward progress, there is no guarantee that we will not continue to select such processes. Giving access to the memory allocator eventually allows all allocators to successfully allocate, giving the holder of i_mutex the ability to eventually drop it. This happens in a very rate-limited manner depending on how you define when the page allocator has looped enough waiting for the same process to exit in my patch. In the past, we have even increased the scheduling priority of oom killed processes so that they have a greater likelihood of picking up i_mutex and exiting. > We need to make the OOM killer conclude in a fixed amount of time, no > matter what happens. If the system is irrecoverably deadlocked on > memory it needs to panic (and reboot) so we can get on with it. And > it's silly to panic while there are still killable tasks available. > What is the solution when there are no additional processes that may be killed? It is better to give access to memory reserves so a single stalling allocation can succeed so the livelock can be resolved rather than panicking. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>