Re: [nacked] mm-oom-avoid-printk-iteration-under-rcu.patch removed from -mm tree

Yafang Shao <laoar.shao@xxxxxxxxx> · Thu, 23 Apr 2020 18:22:22 +0800

On Thu, Apr 23, 2020 at 3:34 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Thu 23-04-20 14:35:22, Yafang Shao wrote:
> > On Thu, Apr 23, 2020 at 1:35 PM Tetsuo Handa
> [...]
> > > dump_tasks() remains definitely a printk() abuser which is capable of pushing
> > > many thousands of printk() messages in one second if async printk were available.
> > > Async printk CANNOT deal with the problem that too much backlog causes important
> > > messages to be delayed for too long. Please read my explanation carefully.
> > >
> >
> > Agreed. Too much oom reports still be a issue even if the printk() is asyn.
>
> I believe nobody is disputing this part. We are talking about two things
> here and I believe that contributes to a confusion considerably
> 1) dump_tasks being a large noise generator to the kernel log buffer
> 2) a heavy printk load from the oom context
>
> There is no good answer for 1). We simply print a lot of data that
> scales with the number eligible tasks and that might be thousands.
> We have done quite a lot of work to make the data collecting part of the
> process as optimal as possible but having this feature enabled by
> default is simply a package we have to carry with us. printk doesn't
> cope with such a load really great currently. There might be some future
> changes but the underline is that no matter how printk gets optimized
> there is still the payload to be printed. No matter this happens
> transparently async or explicitly done in a detached context.
>
> 2) is about the sync nature of the printk _right_now_ and that causes
> delays in the allocator context while the system is OOM. There are locks
> held both by the OOM context and in the call chain to the allocator
> potentially. The longer the oom context is going to take the longer is
> the agony going to take. Here is where the async printing might help
> because it would push out the heavy lifting to a different context.
>
> There is a clear agreement in this part.  The whole discussion in this
> thread is about how to achieve that. There are two ways. Develop a code
> to do that for this very specific case (aka push out to a worker) or
> rely on printk doing that for us and potentially many other places in a
> similar situation. I am definitely for the later option because a) it
> adds less code we have to maintain and b) it is a more generic solution.
>
> For the current or older kernels there are two ways to workaround for
> the problem and floods of oom killer events doesn't seem to be be a
> regular production system state (I would even dare to claim that
> something is terribly wrong if yes) so no quick&dirty hacks are due.
> Either tune the log level or simply disable dump_tasks. It is an useful
> tool in some cases but not really necessary in the vast majority of
> cases.
>

Thanks for your explanation.
We have an agreement here.

> > I think the aysnc printk() won't care about wheter the data is
> > improtant or not, so the user of printk() (even if it is asyn) should
> > have a good management of these data especially if these data may
> > consume all or most of the printk buffer.
>
> Not sure what you mean here. We do have an option to tune the ring
> buffer (both size and log levels) and dump_tasks specifically.
>

There're drawbacks in both of these two options.

printk() is multiple-producer and mutiple-consumer.
OOM is one of the producers.
logfile (/var/log/messages) and console are two of the consumers.
Now let's see the drawback of each option.

- tune the ring buffer (both size and log levels)
All the producers are effected.
For example, if you tune the log levels, then all producers have the
same loglevel with dump_stack() can't show in the console.
Tuning the size may be not scalable, because we don't know how slow
the console is and tuning it too big is a waste of memory.

- tune the dump_tasks specifically （vm.oom_dump_tasks）
All the consumers are effected.
The logfile is fast enough, so we expect that these dump_tasks could
be printed into the logfile.
The console is so slow that we don't want to print into it.
A possilbe way to fix it is improve vm.oom_dump_tasks.
    vm.oom_dump_tasks : 1 - dump into all consumers
                                         2 - don't dump into console
                                         0 - don't dump into any of
the consumers
But someone may still needs these dump_tasks from the console.

As I'm not familiar with asyn printk(), my understanding may be not correct.

-- 
Thanks
Yafang