On Tue, 17 Feb 2015, Tetsuo Handa wrote: > Yes, basic idea would be same with > http://marc.info/?l=linux-mm&m=142002495532320&w=2 . > > But Michal and David do not like the timeout approach. > http://marc.info/?l=linux-mm&m=141684783713564&w=2 > http://marc.info/?l=linux-mm&m=141686814824684&w=2 > > Unless they change their opinion in response to the discovery explained at > http://lwn.net/Articles/627419/ , timeout patches will not be accepted. > Unfortunately, timeout based solutions aren't guaranteed to provide anything more helpful. The problem you're referring to is when the oom kill victim is waiting on a mutex and cannot make forward progress even though it has access to memory reserves. Threads that are holding the mutex and allocate in a blockable context will cause the oom killer to defer forever because it sees the presence of a victim waiting to exit. TaskA TaskB ===== ===== mutex_lock(i_mutex) allocate memory oom kill TaskB mutex_lock(i_mutex) In this scenario, nothing on the system will be able to allocate memory without some type of memory reserve since at least one thread is holding the mutex that the victim needs and is looping forever, unless memory is freed by something else on the system which allows TaskA to allocate and drop the mutex. In a timeout based solution, this would be detected and another thread would be chosen for oom kill. There's currently no way for the oom killer to select a process that isn't waiting for that same mutex, however. If it does, then the process has been killed needlessly since it cannot make forward progress itself without grabbing the mutex. Certainly, it would be better to eventually kill something else in the hope that it does not need the mutex and will free some memory which would allow the thread that had originally been deferring forever, TaskA, in the oom killer waiting for the original victim, TaskB, to exit. If that's the solution, then TaskA had been killed unnecessarily itself. Perhaps we should consider an alternative: allow threads, such as TaskA, that are deferring for a long amount of time to simply allocate with ALLOC_NO_WATERMARKS itself in that scenario in the hope that the allocation succeeding will eventually allow it to drop the mutex. Two problems: (1) there's no guarantee that the simple allocation is all TaskA needs before it will drop the lock and (2) another thread could immediately grab the same mutex and allocate, in which the same series of events repeats. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>