Do you plan to repost these two? They are quite deep in the email thread so they can easily fall through cracks. On Wed 08-08-18 18:17:37, Michal Hocko wrote: > On Wed 08-08-18 10:45:15, Johannes Weiner wrote: [...] > > >From bba01122f739b05a689dbf1eeeb4f0e07affd4e7 Mon Sep 17 00:00:00 2001 > > From: Johannes Weiner <hannes@xxxxxxxxxxx> > > Date: Wed, 8 Aug 2018 09:59:40 -0400 > > Subject: [PATCH] mm: memcontrol: print proper OOM header when no eligible > > victim left > > > > When the memcg OOM killer runs out of killable tasks, it currently > > prints a WARN with no further OOM context. This has caused some user > > confusion. > > > > Warnings indicate a kernel problem. In a reported case, however, the > > situation was triggered by a non-sensical memcg configuration (hard > > limit set to 0). But without any VM context this wasn't obvious from > > the report, and it took some back and forth on the mailing list to > > identify what is actually a trivial issue. > > > > Handle this OOM condition like we handle it in the global OOM killer: > > dump the full OOM context and tell the user we ran out of tasks. > > > > This way the user can identify misconfigurations easily by themselves > > and rectify the problem - without having to go through the hassle of > > running into an obscure but unsettling warning, finding the > > appropriate kernel mailing list and waiting for a kernel developer to > > remote-analyze that the memcg configuration caused this. > > > > If users cannot make sense of why the OOM killer was triggered or why > > it failed, they will still report it to the mailing list, we know that > > from experience. So in case there is an actual kernel bug causing > > this, kernel developers will very likely hear about it. > > > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > > Yes this works as well. We would get a dump even for the race we have > seen but I do not think this is something to lose sleep over. And if it > triggers too often to be disturbing we can add > tsk_is_oom_victim(current) check there. > > Acked-by: Michal Hocko <mhocko@xxxxxxxx> > > > --- > > mm/memcontrol.c | 2 -- > > mm/oom_kill.c | 13 ++++++++++--- > > 2 files changed, 10 insertions(+), 5 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 4e3c1315b1de..29d9d1a69b36 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1701,8 +1701,6 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > > if (mem_cgroup_out_of_memory(memcg, mask, order)) > > return OOM_SUCCESS; > > > > - WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " > > - "This looks like a misconfiguration or a kernel bug."); > > return OOM_FAILED; > > } > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index 0e10b864e074..07ae222d7830 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -1103,10 +1103,17 @@ bool out_of_memory(struct oom_control *oc) > > } > > > > select_bad_process(oc); > > - /* Found nothing?!?! Either we hang forever, or we panic. */ > > - if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { > > + /* Found nothing?!?! */ > > + if (!oc->chosen) { > > dump_header(oc, NULL); > > - panic("Out of memory and no killable processes...\n"); > > + pr_warn("Out of memory and no killable processes...\n"); > > + /* > > + * If we got here due to an actual allocation at the > > + * system level, we cannot survive this and will enter > > + * an endless loop in the allocator. Bail out now. > > + */ > > + if (!is_sysrq_oom(oc) && !is_memcg_oom(oc)) > > + panic("System is deadlocked on memory\n"); > > } > > if (oc->chosen && oc->chosen != (void *)-1UL) > > oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : > > -- > > 2.18.0 > > > > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs