On 2020/04/17 20:57, Yafang Shao wrote: >>>>> I justed worried that the user may complain it if too many >>>>> oom_kill_process callbacks are suppressed. >>>> >>>> This can be a real concern indeed. >> >> I'm proposing automated ratelimiting of dump_tasks() at >> http://lkml.kernel.org/r/1563360901-8277-1-git-send-email-penguin-kernel@xxxxxxxxxxxxxxxxxxx . >> I believe that automated ratelimiting of dump_tasks() remains necessary >> even after printk() became asynchronous. >> > > Thanks for your information. > I haven't read your proposal carefully, but take a first glance I > think it would be a useful improvement. Thank you. That patch alone avoids just RCU stall. But https://lkml.kernel.org/r/7de2310d-afbd-e616-e83a-d75103b986c6@xxxxxxxxxxxxxxxxxxx and https://lkml.kernel.org/r/57be50b2-a97a-e559-e4bd-10d923895f83@xxxxxxxxxxxxxxxxxxx referenced from that thread allows defer printing of OOM victim candidates. And >>> Yes, printk being too sync is the real issue. If the printk an be >>> async, then we don't need to worry about it at all. >> >> I strongly disagree. dump_tasks() will needlessly fill printk() log buffer >> (and potentially loose other kernel messages due to buffer full / disk full). >> > > Yup, printk() log buffer will be a issue if the console is too slow. > After the printk() is implemented as async, I thinks it is worth to do > some optimization. my suggestion is to offload printing of OOM victim candidates to a workqueue context. Then, even after printk() became asynchronous, that workqueue waits for completion of printing to consoles for each OOM victim candidate. This way, only dump_tasks() where dumping of past OOM-killer invocations has not completed will suppress dump_tasks() from later OOM-killer invocations in a way duplicated OOM victims won't be reported for many times (and also saves printk() log buffer / disk space). I need real world reports (like your report)...