On Wed 21-08-19 15:22:07, Edward Chron wrote: > On Wed, Aug 21, 2019 at 12:19 AM David Rientjes <rientjes@xxxxxxxxxx> wrote: > > > > On Wed, 21 Aug 2019, Michal Hocko wrote: > > > > > > vm.oom_dump_tasks is pretty useful, however, so it's curious why you > > > > haven't left it enabled :/ > > > > > > Because it generates a lot of output potentially. Think of a workload > > > with too many tasks which is not uncommon. > > > > Probably better to always print all the info for the victim so we don't > > need to duplicate everything between dump_tasks() and dump_oom_summary(). > > > > Edward, how about this? > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -420,11 +420,17 @@ static int dump_task(struct task_struct *p, void *arg) > > * State information includes task's pid, uid, tgid, vm size, rss, > > * pgtables_bytes, swapents, oom_score_adj value, and name. > > */ > > -static void dump_tasks(struct oom_control *oc) > > +static void dump_tasks(struct oom_control *oc, struct task_struct *victim) > > { > > pr_info("Tasks state (memory values in pages):\n"); > > pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n"); > > > > + /* If vm.oom_dump_tasks is disabled, only show the victim */ > > + if (!sysctl_oom_dump_tasks) { > > + dump_task(victim, oc); > > + return; > > + } > > + > > if (is_memcg_oom(oc)) > > mem_cgroup_scan_tasks(oc->memcg, dump_task, oc); > > else { > > @@ -465,8 +471,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p) > > if (is_dump_unreclaim_slabs()) > > dump_unreclaimable_slab(); > > } > > - if (sysctl_oom_dump_tasks) > > - dump_tasks(oc); > > + if (p || sysctl_oom_dump_tasks) > > + dump_tasks(oc, p); > > if (p) > > dump_oom_summary(oc, p); > > } > > I would be willing to accept this, though as Michal mentions in his > post, it would be very helpful to have the oom_score_adj on the Killed > process message. > > One reason for that is that the Killed process message is the one > message that is printed with error priority (pr_err) > and so that message can be filtered out and sent to notify support > that an OOM event occurred. > Putting any information that can be shared in that message is useful > from my experience as it the initial point of triage for an OOM event. > Even if the full log with per user process is available it the > starting point for triage for an OOM event. > > So from my perspective I would be happy having both, with David's > proposal providing a bit of extra information as shown here: > > Jul 21 20:07:48 linuxserver kernel: [ pid ] uid tgid total_vm > rss pgtables_bytes swapents oom_score_adj name > Jul 21 20:07:48 linuxserver kernel: [ 547] 0 547 31664 > 615 299008 0 0 > systemd-journal > > The OOM Killed process message will print as: > > Jul 21 20:07:48 linuxserver kernel: Out of memory: Killed process 2826 > (oomprocs) total-vm:1056800kB, anon-rss:1052784kB, file-rss:4kB, > shmem-rss:0kB oom_score_adj:1000 > > But if only one one output change is allowed I'd favor the Killed > process message since that can be singled due to it's print priority > and forwarded. > > By the way, right now there is redundancy in that the Killed process > message is printing vm, rss even if vm.oom_dump_tasks is enabled. > I don't see why that is a big deal. There will always be redundancy there because dump_tasks part is there mostly to check the oom victim decision for potential wrong/unexpected selection. While "killed..." message is there to inform who has been killed. Most people really do care about that part only. > It is very useful to have all the information that is there. > Wouldn't mind also having pgtables too but we would be able to get > that from the output of dump_task if that is enabled. I am not against adding pgrable information there. That memory is going to be released when the task dies. > If it is acceptable to also add the dump_task for the killed process > for !sysctl_oom_dump_tasks I can repost the patch including that as > well. Well, I would rather focus on adding the missing pieces to the killed task message instead. -- Michal Hocko SUSE Labs