On Thu 22-03-18 13:29:37, David Rientjes wrote: > On Thu, 22 Mar 2018, Michal Hocko wrote: [...] > > They simply cannot because kmalloc performs the change under the cover. > > So you would have to use kmalloc(gfp|__GFP_NORETRY) to be absolutely > > sure to not trigger _any_ oom killer. This is just wrong thing to do. > > > > Examples of where this isn't already done? It certainly wasn't a problem > before __GFP_NORETRY was dropped in commit 2516035499b9 but you suspect > it's a problem now. It is not a problem _right now_ as I've already pointed out few times. We do not trigger the OOM killer for anything but #PF path. But this is an implementation detail which can change in future and there is actually some demand for the change. Once we start triggering the oom killer for all charges then we do not really want to have the disparity. > > > You're diverging from it because the memcg charge path has never had this > > > heuristic. > > > > Which is arguably a bug which just didn't matter because we do not > > have costly order oom eligible charges in general and THP was subtly > > different and turned out to be error prone. > > > > It was inadvertently dropped from commit 2516035499b9. There were no > high-order charge oom kill problems before this commit. People know how > to use __GFP_NORETRY or leave it off, which you don't trust them to do > because you're hardcoding a heuristic in the charge path. No. Just read what I wrote. I am worried that the current disparity between the page allocator and the memcg charging will _force_ them to do hacks and sometimes (e.g. kmalloc) they will not have any option but using __GFP_NORETRY even when that is not really needed and it has a different semantic than they would like. Behavior on with and without memcgs should be as similar as possible otherwise you will see different sets of bugs when running under the memcg and without. I really fail to see what is so hard about this to understand. [...] > > > Your change is broken and I wouldn't push it to Linus for rc7 if my life > > > depended on it. What is the response when someone complains that they > > > start getting a ton of MEMCG_OOM notifications for every thp fallback? > > > They will, because yours is a broken implementation. > > > > I fail to see what is broken. Could you be more specific? > > > > I said MEMCG_OOM notifications on thp fallback. You modified > mem_cgroup_oom(). What is called before mem_cgroup_oom()? > mem_cgroup_event(mem_over_limit, MEMCG_OOM). That increments the > MEMCG_OOM event and anybody waiting on the events file gets notified it > changed. They read a MEMCG_OOM event. It's thp fallback, it's not memcg > oom. MEMCG_OOM doesn't count the number of oom killer invocations. That has never been the case. > Really, I can't continue to write 100 emails in this thread. Then try to read and even try to understand concerns expressed in those emails. Repeating the same set of arguments and ignoring the rest is not really helpful. -- Michal Hocko SUSE Labs