Re: [RFC PATCH] mm, oom: oom ratelimit auto tuning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 14-04-20 20:32:54, Yafang Shao wrote:
> On Tue, Apr 14, 2020 at 3:39 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
[...]
> > Besides that I strongly suspect that you would be much better of
> > by disabling /proc/sys/vm/oom_dump_tasks which would reduce the amount
> > of output a lot. Or do you really require this information when
> > debugging oom reports?
> >
> 
> Yes, disabling /proc/sys/vm/oom_dump_tasks can save lots of time.
> But I'm not sure whehter we can disable it totally, because disabling
> it would prevent the tasks log from being wrote into /var/log/messages
> neither.

Yes, eligible tasks would be really missing. The real question is
whether you are really going to miss that information. From my
experience of looking into oom reports for years I can tell that the
list might be useful but in a vast majority of cases I simply do not
really neeed it because the stat of memory and chosen victims are much
more important. The list of tasks is usually interesting only when you
want to double check whether the victim selection was reasonable or
cases where a list of tasks itself can tell whether something went wild
in the userspace.

> > > The OOM ratelimit starts with a slow rate, and it will increase slowly
> > > if the speed of the console is rapid and decrease rapidly if the speed
> > > of the console is slow. oom_rs.burst will be in [1, 10] and
> > > oom_rs.interval will always greater than 5 * HZ.
> >
> > I am not against increasing the ratelimit timeout. But this patch seems
> > to be trying to be too clever.  Why cannot we simply increase the
> > parameters of the ratelimit?
> 
> I justed worried that the user may complain it if too many
> oom_kill_process callbacks are suppressed.

This can be a real concern indeed.

> But considering that OOM burst at the same time are always because of
> the same reason,

This is not really the case. Please note that many parallel OOM killers
might happen in memory cgroup setups.

> so I think one snapshot of the OOM may be enough.
> Simply setting oom_rs with {20 * HZ, 1} can resolve this issue.

Does it really though? The ratelimit doesn't stop the long taking
output. It simply cannot because the work is already done.

That being said, making the ratelimiting more aggressive sounds more
like a workaround than an actual fix. So I would go that route only if
there is no other option. I believe the real problem here is in printk
being too synchronous here. This is a general problem and something
printk maintainers are already working on.

For now I would recommend to workaround this problem by reducing the log
level or disabling dump_tasks.

-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux