Re: [PATCH] mm, oom: enable rate-limiting controls for oom dumps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020/10/13 18:02, Petr Mladek wrote:
> On Tue 2020-10-13 09:40:27, Tetsuo Handa wrote:
>> On 2020/10/13 0:41, Michal Hocko wrote:
>>>> What about introducing some feedback from the printk code?
>>>>
>>>>      static u64 printk_last_report_seq;
>>>>
>>>>      if (consoles_seen(printk_last_report_seq)) {
>>>> 	dump_header();
>>>> 	printk_last_report_seq = printk_get_last_seq();
>>>>      }
>>>>
>>>> By other words. It would skip the massive report when the consoles
>>>> were not able to see the previous one.
>>>
>>> I am pretty sure this has been discussed in the past but maybe we really
>>> want to make ratelimit to work reasonably also for larger sections
>>> instead. Current implementation only really works if the rate limited
>>> operation is negligible wrt to the interval. Can we have a ratelimit
>>> alternative with a scope effect (effectivelly lock like semantic)?
>>> 	if (rate_limit_begin(&oom_rs)) {
>>> 		dump_header();
>>> 		rate_limit_end(&oom_rs);
>>> 	}
>>>
>>> rate_limi_begin would act like a try lock with additional constrain on
>>> the period/cadence based on rate_limi_end marked values.
>>>
>>
>> Here is one of past discussions.
>>
>>   https://lkml.kernel.org/r/7de2310d-afbd-e616-e83a-d75103b986c6@xxxxxxxxxxxxxxxxxxx
>>   https://lkml.kernel.org/r/20190830103504.GA28313@xxxxxxxxxxxxxx
>>   https://lkml.kernel.org/r/57be50b2-a97a-e559-e4bd-10d923895f83@xxxxxxxxxxxxxxxxxxx
>>
>> Michal Hocko complained about different OOM domains, and now just ignores it...
> 
> How is this related to this discussion, please? AFAIK, we are
> discussing how to tune the values of the existing ratelimiting.

dump_tasks() is one of functions called from dump_header().

Since Michal wants to recognize OOM domains when ratelimiting dump_tasks(),
ratelimit for dump_header() is also expected to recognize OOM domains.

> 
>> Proper ratelimiting for OOM messages had better not to count on asynchronous printk().
> 
> I am a bit confused. AFAIK, you wanted to print OOM messages
> asynchronous ways in the past. The lockless printk ringbuffer is on
> its way into 5.10. Handling consoles in kthreads will be the next
> step of the printk rework.

What I'm proposing is synchronously printing OOM messages from a different
thread, for one dump_tasks() call can generate thousands of lines which may
significantly delay arrival of non OOM related messages to consoles (or even
drop due to logbuf being full). I don't want to enqueue too many OOM related
messages to logbuf, even after printk() became completely asynchronous.

> 
> OK, the current state is that printk() is semi-synchronous. It does
> console_trylock(). The console is handled immediately when it
> succeeds. Otherwise it expects that the current console_lock owner
> would do the job.
> 
> Tuning ratelimits is not trivial for a particular system. It would
> be better to have some autotuning. If the printk is synchronous,
> we could measure how long the printing took. If it is asynchronous,
> we could check whether the last report has been already flushed or
> not. We could then decide whether to print the new report.

Whether the last report has been already flushed needs to recognize
OOM domains.

> 
> What is the desired behavior, please?
> 
> Could you please provide some examples how you would tune ratelimit
> when printing all messages to the console takes X ms and OOM
> happens every Y ms?

My proposal is to decide whether to print the new report based on
whether all OOM candidates for that OOM domain have been flushed to
consoles. There is no X and Y.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux