The patch titled Subject: mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks has been added to the -mm tree. Its filename is mm-oom-pagefault_out_of_memory-dont-force-global-oom-for-dying-tasks.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-oom-pagefault_out_of_memory-dont-force-global-oom-for-dying-tasks.patch and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-oom-pagefault_out_of_memory-dont-force-global-oom-for-dying-tasks.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Vasily Averin <vvs@xxxxxxxxxxxxx> Subject: mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks Patch series "memcg: prohibit unconditional exceeding the limit of dying tasks", v3. Memory cgroup charging allows killed or exiting tasks to exceed the hard limit. It can be misused and allowed to trigger global OOM from inside a memcg-limited container. On the other hand if memcg fails allocation, called from inside #PF handler it triggers global OOM from inside pagefault_out_of_memory(). To prevent these problems this patchset: a) removes execution of out_of_memory() from pagefault_out_of_memory(), becasue nobody can explain why it is necessary. b) allow memcg to fail allocation of dying/killed tasks. This patch (of 3): Any allocation failure during the #PF path will return with VM_FAULT_OOM which in turn results in pagefault_out_of_memory which in turn executes out_out_memory() and can kill a random task. An allocation might fail when the current task is the oom victim and there are no memory reserves left. The OOM killer is already handled at the page allocator level for the global OOM and at the charging level for the memcg one. Both have much more information about the scope of allocation/charge request. This means that either the OOM killer has been invoked properly and didn't lead to the allocation success or it has been skipped because it couldn't have been invoked. In both cases triggering it from here is pointless and even harmful. It makes much more sense to let the killed task die rather than to wake up an eternally hungry oom-killer and send him to choose a fatter victim for breakfast. Link: https://lkml.kernel.org/r/0828a149-786e-7c06-b70a-52d086818ea3@xxxxxxxxxxxxx Signed-off-by: Vasily Averin <vvs@xxxxxxxxxxxxx> Suggested-by: Michal Hocko <mhocko@xxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> Cc: Roman Gushchin <guro@xxxxxx> Cc: Shakeel Butt <shakeelb@xxxxxxxxxx> Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Cc: Uladzislau Rezki <urezki@xxxxxxxxx> Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/oom_kill.c | 3 +++ 1 file changed, 3 insertions(+) --- a/mm/oom_kill.c~mm-oom-pagefault_out_of_memory-dont-force-global-oom-for-dying-tasks +++ a/mm/oom_kill.c @@ -1137,6 +1137,9 @@ void pagefault_out_of_memory(void) if (mem_cgroup_oom_synchronize(true)) return; + if (fatal_signal_pending(current)) + return; + if (!mutex_trylock(&oom_lock)) return; out_of_memory(&oc); _ Patches currently in -mm which might be from vvs@xxxxxxxxxxxxx are mm-oom-pagefault_out_of_memory-dont-force-global-oom-for-dying-tasks.patch memcg-prohibit-unconditional-exceeding-the-limit-of-dying-tasks.patch mm-vmalloc-repair-warn_allocs-in-__vmalloc_area_node.patch vmalloc-back-off-when-the-current-task-is-oom-killed.patch