On Wed, Jul 15, 2020 at 11:18 AM David Rientjes <rientjes@xxxxxxxxxx> wrote: > > On Wed, 15 Jul 2020, Yafang Shao wrote: > > > > It has successfully resolved it for several years in our kernel, we tried > > > an approach similiar to yours but saw many instances where memcg oom kills > > > continued to proceed even though the memcg information dumped to the > > > kernel log showed memory available. > > > > > > If this was a page or two that became available due to memory freeing, > > > it's not a significant difference. Instead, if this races with an oom > > > notification and a process exiting or being SIGKILL'd, it becomes much > > > harder to explain to a user why their process was oom killed when there > > > are tens of megabytes of memory available as shown by the kernel log (the > > > freeing/exiting happened during a particularly long iteration of processes > > > attached to the memcg, for example). > > > > > > That's what motivated a change to moving this to out_of_memory() directly, > > > we found that it prevented even more unnecessary oom kills, which is a > > > very good thing. It may only be easily observable and make a significant > > > difference at very large scale, however. > > > > > > > Thanks for the clarification. > > > > If it is the race which causes this issue and we want to reduce the > > race window, I don't know whether it is proper to check the memcg > > margin in out_of_memory() or do it before calling do_send_sig_info(). > > Because per my understanding, dump_header() always takes much more > > time than select_bad_process() especially if there're slow consoles. > > So the race might easily happen when doing dump_header() or dumping > > other information, but if we check the memcg margin after dumping this > > oom info, it would be strange to dump so much oom logs without killing > > a process. > > > > Absolutely correct :) In my proposed patch, we declare dump_header() as > the "point of no return" since we don't want to dump oom kill information > to the kernel log when nothing is actually killed. We could abort at the > very last minute, as you mention, but I think that may have an adverse > impact on anything that cares about that log message. How about storing the memcg information in oom_control when the memcg oom is triggered, and then show this information in dump_header() ? IOW, the OOM info really shows the memcg status when oom occurs, rather than the memcg status when this info is printed. -- Thanks Yafang