Hi, I have posted this as an RFC [1] to see whether the approach I've taken is acceptable. There didn't seem to be any fundamental opposition so I have dropped the RFC. I would like to target this for 4.10 and sending this early because I will be offline for a longer time at the end of Oct. The series is on top of the current mmotm tree (2016-09-27-16-08). It has passed my basic testing and nothing blew up but this additional testing never hurts as well as a deep review I would be really grateful for. The primary point of this series is to get rid of TIF_MEMDIE finally. This has been on my TODO list for quite some time because the flag has proven to cause many problems. First of all, the flag was terribly overloaded. It used to act as a oom lock to prevent from multiple oom selection, then it grants access to memory reserves and finally it is used to count oom victims for oom_killer_disable() logic. It really didn't help that the flag is per task_struct (aka thread) while the OOM is mm_struct scope operation. This means that all threads in the same thread group - or in general all processes sharing the mm - will have to get the flag for the code to rely on it reliably. This was not that easy because at least access to memory reserves for all threads could deplete them quite easily. Setting the flag to all threads is quite challenging, though, because mark_oom_victim can race with copy_process and we could easily miss a thread. That being said it would be better to get rid of the flag rather workaround existing issues and add more complicated code to fix the fundamental mismatch. Recent changes in the oom proper allows for that finally, I believe. Now that all the oom victims are reapable we are no longer depending on ALLOC_NO_WATERMARKS because the memory held by the victim is reclaimed asynchronously. A partial access to memory reserves should be sufficient just to guarantee that the oom victim is not starved due to other memory consumers. This also means that we do not have to pretend to be conservative and give access to memory reserves only to one thread from the process at the time. This is patch 1. Patch 2 is a simple cleanup which turns TIF_MEMDIE users to tsk_is_oom_victim which is process rather than thread centric. None of those callers really requires to be thread aware AFAICS. The tricky part then is exit_oom_victim vs. oom_killer_disable because TIF_MEMDIE acted as a token there so we had a way to count threads from the process. It didn't work 100% reliably and had its own issues but we have to replace it with something which doesn't rely on counting threads but rather find a moment when all threads have reached steady state in do_exit. This is what patch 3 does and I would really appreciate if Oleg could double check my thinking there. I am also CCing Al on that one because I am moving exit_io_context up in do_exit right before exit_notify. The last patch just removes TIF_MEMDIE from the arch code because it is no longer needed anywhere. I really appreciate any feedback. Changes since RFC - add motivation to the cover as suggested by Johannes - rebased on top of the current mmotm [1] http://lkml.kernel.org/r/1472723464-22866-1-git-send-email-mhocko@xxxxxxxxxx Michal Hocko (4): mm, oom: do not rely on TIF_MEMDIE for memory reserves access mm: replace TIF_MEMDIE checks by tsk_is_oom_victim mm, oom: do not rely on TIF_MEMDIE for exit_oom_victim arch: get rid of TIF_MEMDIE The diffstat looks quite promissing to me. arch/alpha/include/asm/thread_info.h | 1 - arch/arc/include/asm/thread_info.h | 2 -- arch/arm/include/asm/thread_info.h | 1 - arch/arm64/include/asm/thread_info.h | 1 - arch/avr32/include/asm/thread_info.h | 2 -- arch/blackfin/include/asm/thread_info.h | 1 - arch/c6x/include/asm/thread_info.h | 1 - arch/cris/include/asm/thread_info.h | 1 - arch/frv/include/asm/thread_info.h | 1 - arch/h8300/include/asm/thread_info.h | 1 - arch/hexagon/include/asm/thread_info.h | 1 - arch/ia64/include/asm/thread_info.h | 1 - arch/m32r/include/asm/thread_info.h | 1 - arch/m68k/include/asm/thread_info.h | 1 - arch/metag/include/asm/thread_info.h | 1 - arch/microblaze/include/asm/thread_info.h | 1 - arch/mips/include/asm/thread_info.h | 1 - arch/mn10300/include/asm/thread_info.h | 1 - arch/nios2/include/asm/thread_info.h | 1 - arch/openrisc/include/asm/thread_info.h | 1 - arch/parisc/include/asm/thread_info.h | 1 - arch/powerpc/include/asm/thread_info.h | 1 - arch/s390/include/asm/thread_info.h | 1 - arch/score/include/asm/thread_info.h | 1 - arch/sh/include/asm/thread_info.h | 1 - arch/sparc/include/asm/thread_info_32.h | 1 - arch/sparc/include/asm/thread_info_64.h | 1 - arch/tile/include/asm/thread_info.h | 2 -- arch/um/include/asm/thread_info.h | 2 -- arch/unicore32/include/asm/thread_info.h | 1 - arch/x86/include/asm/thread_info.h | 1 - arch/xtensa/include/asm/thread_info.h | 1 - include/linux/sched.h | 2 +- kernel/cpuset.c | 9 ++--- kernel/exit.c | 38 +++++++++++++++------ kernel/freezer.c | 3 +- mm/internal.h | 11 ++++++ mm/memcontrol.c | 2 +- mm/oom_kill.c | 40 +++++++++++++--------- mm/page_alloc.c | 57 +++++++++++++++++++++++++------ 40 files changed, 117 insertions(+), 81 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>