On Fri, Jul 17, 2020 at 3:53 AM David Rientjes <rientjes@xxxxxxxxxx> wrote: > > On Thu, 16 Jul 2020, Yafang Shao wrote: > > > > It's likely a misunderstanding: I wasn't necessarily concerned about > > > showing 60MB in a memcg limited to 100MB, that part we can deal with, the > > > concern was after dumping all of that great information that instead of > > > getting a "Killed process..." we get a "Oh, there's memory now, just > > > kidding about everything we just dumped" ;) > > > > > > > Actually the kernel is doing it now, see bellow, > > > > dump_header() <<<< dump lots of information > > __oom_kill_process > > p = find_lock_task_mm(victim); > > if (!p) > > return; <<<< without killing any process. > > > > Ah, this is catching an instance where the chosen process has already done > exit_mm(), good catch -- I can find examples of this by scraping kernel > logs from our fleet. > > So it appears there is precedence for dumping all the oom info but not > actually performing any action for it and I made the earlier point that > diagnostic information in the kernel log here is still useful. I think it > is still preferable that the kernel at least tell us why it didn't do > anything, but as you mention that already happens today. > > Would you like to send a patch that checks for mem_cgroup_margin() here as > well? A second patch could make the possible inaction more visibile, > something like "Process ${pid} (${comm}) is already exiting" for the above > check or "Memcg ${memcg} is no longer out of memory". > > Another thing that these messages indicate, beyond telling us why the oom > killer didn't actually SIGKILL anything, is that we can expect some skew > in the memory stats that shows an availability of memory. > Agreed, these messages would be helpful. I will send a patch for it. > > > > > We could likely enlighten userspace about that so that we don't consider > > > that to be an actual oom kill. But I also very much agree that after > > > dump_header() would be appropriate as well since the goal is to prevent > > > unnecessary oom killing. > > > > > > Would you mind sending a patch to check mem_cgroup_margin() on > > > is_memcg_oom() prior to sending the SIGKILL to the victim and printing the > > > "Killed process..." line? We'd need a line that says "xKB of memory now > > > available -- suppressing oom kill" or something along those lines so > > > userspace understands what happened. But the memory info that it emits > > > both for the state of the memcg and system RAM may also be helpful to > > > understand why we got to the oom kill in the first place, which is also a > > > good thing. > > > > > > I'd happy ack that patch since it would be a comprehensive solution that > > > avoids oom kill of user processes at all costs, which is a goal I think we > > > can all rally behind. > > > > I'd prefer to put dump_header() behind do_send_sig_info(), for example, > > > > __oom_kill_process() > > do_send_sig_info() > > dump_header() <<<< may better put it behind wake_oom_reaper(), but > > it may loses some information to dump... > > pr_err("%s: Killed process %d (%s)....") > > > > I agree with Michal here that dump_header() after the actual kill would no > longer represent the state of the system (or cpuset or memcg, depending on > context) at the time of the oom kill so it's best to dump relevant > information before the actual kill. -- Thanks Yafang