Kyle Walker wrote: > I agree, in lieu of treating TASK_UNINTERRUPTIBLE tasks as unkillable, > and omitting them from the oom selection process, continuing the > carnage is likely to result in more unpredictable results. At this > time, I believe Oleg's solution of zapping the process memory use > while it sleeps with the fatal signal enroute is ideal. I cannot help thinking about the worst case. (1) If memory zapping code successfully reclaimed some memory from the mm struct used by the OOM victim, what guarantees that the reclaimed memory is used by OOM victims (and processes which are blocking OOM victims)? David's "global access to memory reserves" allows a local unprivileged user to deplete memory reserves; could allow that user to deplete the reclaimed memory as well. I think that my "Favor kthread and dying threads over normal threads" ( http://lkml.kernel.org/r/1442939668-4421-1-git-send-email-penguin-kernel@xxxxxxxxxxxxxxxxxxx ) would allow the reclaimed memory to be used by OOM victims and kernel threads if the reclaimed memory is added to free list bit by bit in a way that watermark remains low enough to prevent normal threads from allocating the reclaimed memory. But my patch still fails if normal threads are blocking the OOM victims or unrelated kernel threads consume the reclaimed memory. (2) If memory zapping code failed to reclaim enough memory from the mm struct needed for the OOM victim, what mechanism can solve the OOM stalls? Some administrator sets /proc/pid/oom_score_adj to -1000 to most of enterprise processes (e.g. java) and as a consequence only trivial processes (e.g. grep / sed) are candidates for OOM victims. Moreover, a local unprivileged user can easily fool the OOM killer using decoy tasks (which consumes little memory and /proc/pid/oom_score_adj is set to 999). (3) If memory zapping code reclaimed no memory due to ->mmap_sem contention, what mechanism can solve the OOM stalls? While we don't allocate much memory with ->mmap_sem held for writing, the task which is holding ->mmap_sem for writing can be chosen as one of OOM victims. If such task receives SIGKILL but TIF_MEMDIE is not set, it can form OOM-livelock unless all memory allocations with ->mmap_sem held for writing are __GFP_FS allocations and that task can reach out_of_memory() (i.e. not blocked by unexpected factors such as waiting for filesystem's writeback). After all I think we have to consider what to do if memory zapping code failed. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>