Re: [RFC PATCH] mm, oom: oom ratelimit auto tuning

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Fri, 17 Apr 2020 22:03:56 +0900

On 2020/04/17 20:57, Yafang Shao wrote:
>>>>> I justed worried that the user may complain it if too many
>>>>> oom_kill_process callbacks are suppressed.
>>>>
>>>> This can be a real concern indeed.
>>
>> I'm proposing automated ratelimiting of dump_tasks() at
>> http://lkml.kernel.org/r/1563360901-8277-1-git-send-email-penguin-kernel@xxxxxxxxxxxxxxxxxxx .
>> I believe that automated ratelimiting of dump_tasks() remains necessary
>> even after printk() became asynchronous.
>>
> 
> Thanks for your information.
> I haven't read your proposal carefully, but take a first glance I
> think it would be a useful improvement.

Thank you. That patch alone avoids just RCU stall. But
https://lkml.kernel.org/r/7de2310d-afbd-e616-e83a-d75103b986c6@xxxxxxxxxxxxxxxxxxx and
https://lkml.kernel.org/r/57be50b2-a97a-e559-e4bd-10d923895f83@xxxxxxxxxxxxxxxxxxx
referenced from that thread allows defer printing of OOM victim candidates. And

>>> Yes, printk being too sync is the real issue. If the printk an be
>>> async, then we don't need to worry about it at all.
>>
>> I strongly disagree. dump_tasks() will needlessly fill printk() log buffer
>> (and potentially loose other kernel messages due to buffer full / disk full).
>>
> 
> Yup, printk() log buffer will be a issue if the console is too slow.
> After the printk() is implemented as async, I thinks it is worth to do
> some optimization.

my suggestion is to offload printing of OOM victim candidates to a workqueue context.
Then, even after printk() became asynchronous, that workqueue waits for completion of
printing to consoles for each OOM victim candidate. This way, only dump_tasks() where
dumping of past OOM-killer invocations has not completed will suppress dump_tasks()
 from later OOM-killer invocations in a way duplicated OOM victims won't be reported
for many times (and also saves printk() log buffer / disk space).

I need real world reports (like your report)...