On Wed 25-10-17 12:44:02, Johannes Weiner wrote: > On Wed, Oct 25, 2017 at 04:12:21PM +0200, Michal Hocko wrote: [...] I yet have to digest the first path of the email but the remaining just sounds we are not on the same page. > > So how about we start with a BIG FAT WARNING for the failure case? > > Something resembling warn_alloc for the failure case. > > > > --- > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 5d9323028870..3ba62c73eee5 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1547,9 +1547,14 @@ static bool mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) > > * victim and then we have rely on mem_cgroup_oom_synchronize otherwise > > * we would fall back to the global oom killer in pagefault_out_of_memory > > */ > > - if (!memcg->oom_kill_disable && > > - mem_cgroup_out_of_memory(memcg, mask, order)) > > - return true; > > + if (!memcg->oom_kill_disable) { > > + if (mem_cgroup_out_of_memory(memcg, mask, order)) > > + return true; > > + > > + WARN(!current->memcg_may_oom, > > + "Memory cgroup charge failed because of no reclaimable memory! " > > + "This looks like a misconfiguration or a kernel bug."); > > + } > > That's crazy! > > We shouldn't create interfaces that make it possible to accidentally > livelock the kernel. Then warn about it and let it crash. That is a > DOS-level lack of OS abstraction. > > In such a situation, we should ignore oom_score_adj or ignore the hard > limit. Even panic() would be better from a machine management point of > view than leaving random tasks inside infinite loops. > > Why is OOM-disabling a thing? Why isn't this simply a "kill everything > else before you kill me"? It's crashing the kernel in trying to > protect a userspace application. How is that not insane? I really do not follow. What kind of livelock or crash are you talking about. All this code does is that the charge request (which is not explicitly GFP_NOFAIL) fails with ENOMEM if the oom killer is not able to make a forward progress. That sounds like a safer option than failing with ENOMEM unconditionally which is what we do currently. So the only change I am really proposing is to keep retrying as long as the oom killer makes a forward progress and ENOMEM otherwise. I am also not trying to protect an userspace application. Quite contrary, I would like the application gets ENOMEM when it should run away from the constraint it runs within. I am protecting everything outside of the hard limited memcg actually. So what is that I am missing? -- Michal Hocko SUSE Labs