On Thu, 16 Jul 2020, Michal Hocko wrote: > > > But regardless of whether we present previous data to the user in the > > > kernel log or not, we've determined that oom killing a process is a > > > serious matter and go to any lengths possible to avoid having to do it. > > > For us, that means waiting until the "point of no return" to either go > > > ahead with oom killing a process or aborting and retrying the charge. > > > > > > I don't think moving the mem_cgroup_margin() check to out_of_memory() > > > right before printing the oom info and killing the process is a very > > > invasive patch. Any strong preference against doing it that way? I think > > > moving the check as late as possible to save a process from being killed > > > when racing with an exiter or killed process (including perhaps current) > > > has a pretty clear motivation. > > > > > > > How about ignoring MMF_OOM_SKIP for once? I think this has almost same > > effect as moving the mem_cgroup_margin() check to out_of_memory() > > right before printing the oom info and killing the process. > > How would that help with races when a task is exiting while the oom > selects a victim? We are not talking about races with the oom_reaper > IIUC. Btw. if races with the oom_reaper are a concern then I would much > rather delay the wake up than complicate the existing protocol even > further. Right, this isn't a concern about racing with oom reaping or when finding processes that have already been selected as the oom victim. This is about (potentially significant) amounts of memory that has been uncharged to the memcg hierarchy between the failure of reclaim to uncharge memory and the actual killing of a user process.