Re: [nacked] mm-oom-avoid-printk-iteration-under-rcu.patch removed from -mm tree

Michal Hocko <mhocko@xxxxxxxx> · Thu, 23 Apr 2020 09:34:38 +0200

On Thu 23-04-20 14:35:22, Yafang Shao wrote:
> On Thu, Apr 23, 2020 at 1:35 PM Tetsuo Handa
[...]
> > dump_tasks() remains definitely a printk() abuser which is capable of pushing
> > many thousands of printk() messages in one second if async printk were available.
> > Async printk CANNOT deal with the problem that too much backlog causes important
> > messages to be delayed for too long. Please read my explanation carefully.
> >
> 
> Agreed. Too much oom reports still be a issue even if the printk() is asyn.

I believe nobody is disputing this part. We are talking about two things
here and I believe that contributes to a confusion considerably
1) dump_tasks being a large noise generator to the kernel log buffer
2) a heavy printk load from the oom context

There is no good answer for 1). We simply print a lot of data that
scales with the number eligible tasks and that might be thousands.
We have done quite a lot of work to make the data collecting part of the
process as optimal as possible but having this feature enabled by
default is simply a package we have to carry with us. printk doesn't
cope with such a load really great currently. There might be some future
changes but the underline is that no matter how printk gets optimized
there is still the payload to be printed. No matter this happens
transparently async or explicitly done in a detached context.

2) is about the sync nature of the printk _right_now_ and that causes
delays in the allocator context while the system is OOM. There are locks
held both by the OOM context and in the call chain to the allocator
potentially. The longer the oom context is going to take the longer is
the agony going to take. Here is where the async printing might help
because it would push out the heavy lifting to a different context.

There is a clear agreement in this part.  The whole discussion in this
thread is about how to achieve that. There are two ways. Develop a code
to do that for this very specific case (aka push out to a worker) or
rely on printk doing that for us and potentially many other places in a
similar situation. I am definitely for the later option because a) it
adds less code we have to maintain and b) it is a more generic solution.

For the current or older kernels there are two ways to workaround for
the problem and floods of oom killer events doesn't seem to be be a
regular production system state (I would even dare to claim that
something is terribly wrong if yes) so no quick&dirty hacks are due.
Either tune the log level or simply disable dump_tasks. It is an useful
tool in some cases but not really necessary in the vast majority of
cases.

> I think the aysnc printk() won't care about wheter the data is
> improtant or not, so the user of printk() (even if it is asyn) should
> have a good management of these data especially if these data may
> consume all or most of the printk buffer.

Not sure what you mean here. We do have an option to tune the ring
buffer (both size and log levels) and dump_tasks specifically.

-- 
Michal Hocko
SUSE Labs