Changes since v2: - Use single character (e.g. 'R' for MMF_OOM_SKIP) as suggested by Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> - Add new header to oom_dump_tasks documentation At the present time, when showing potential OOM victims, we do not exclude tasks which already have MMF_OOM_SKIP set; it is possible that the last OOM killable victim was already OOM killed, yet the OOM reaper failed to reclaim memory and set MMF_OOM_SKIP. This can be confusing/or perhaps even misleading, to the reader of the OOM report. Now, we already unconditionally display a task's oom_score_adj_min value that can be set to OOM_SCORE_ADJ_MIN which is indicative of an "unkillable" task i.e. is not eligible. This patch provides a clear indication with regard to the OOM eligibility of each displayed task. Signed-off-by: Aaron Tomlin <atomlin@xxxxxxxxxx> --- Documentation/admin-guide/sysctl/vm.rst | 5 ++-- mm/oom_kill.c | 31 +++++++++++++++++++++---- 2 files changed, 30 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 586cd4b86428..123be642bc7e 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -658,8 +658,9 @@ oom_dump_tasks Enables a system-wide task dump (excluding kernel threads) to be produced when the kernel performs an OOM-killing and includes such information as pid, uid, tgid, vm size, rss, pgtables_bytes, swapents, oom_score_adj -score, and name. This is helpful to determine why the OOM killer was -invoked, to identify the rogue task that caused it, and to determine why +score, oom eligibility status and name. This is helpful to determine why +the OOM killer was invoked, to identify the rogue task that caused it, and +to determine why the OOM killer chose the task it did to kill. If this is set to zero, this information is suppressed. On very diff --git a/mm/oom_kill.c b/mm/oom_kill.c index eefd3f5fde46..094b7b61d66f 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -160,6 +160,27 @@ static inline bool is_sysrq_oom(struct oom_control *oc) return oc->order == -1; } +/** + * is_task_eligible_oom - determine if and why a task cannot be OOM killed + * @tsk: task to check + * + * Needs to be called with task_lock(). + */ +static const char * is_task_oom_eligible(struct task_struct *p) +{ + long adj; + + adj = (long)p->signal->oom_score_adj; + if (adj == OOM_SCORE_ADJ_MIN) + return "S"; + else if (test_bit(MMF_OOM_SKIP, &p->mm->flags) + return "R"; + else if (in_vfork(p)) + return "V"; + else + return ""; +} + /* return true if the task is not adequate as candidate victim task. */ static bool oom_unkillable_task(struct task_struct *p) { @@ -401,12 +422,13 @@ static int dump_task(struct task_struct *p, void *arg) return 0; } - pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu %5hd %s\n", + pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu %5hd %13s %s\n", task->pid, from_kuid(&init_user_ns, task_uid(task)), task->tgid, task->mm->total_vm, get_mm_rss(task->mm), mm_pgtables_bytes(task->mm), get_mm_counter(task->mm, MM_SWAPENTS), - task->signal->oom_score_adj, task->comm); + task->signal->oom_score_adj, is_task_oom_eligible(task), + task->comm); task_unlock(task); return 0; @@ -420,12 +442,13 @@ static int dump_task(struct task_struct *p, void *arg) * memcg, not in the same cpuset, or bound to a disjoint set of mempolicy nodes * are not shown. * State information includes task's pid, uid, tgid, vm size, rss, - * pgtables_bytes, swapents, oom_score_adj value, and name. + * pgtables_bytes, swapents, oom_score_adj value, oom eligibility status + * and name. */ static void dump_tasks(struct oom_control *oc) { pr_info("Tasks state (memory values in pages):\n"); - pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n"); + pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj oom eligible? name\n"); if (is_memcg_oom(oc)) mem_cgroup_scan_tasks(oc->memcg, dump_task, oc); -- 2.26.3