Re: [PATCH] mm, oom: avoid printk() iteration under RCU

Michal Hocko <mhocko@xxxxxxxx> · Mon, 23 Sep 2019 10:23:25 +0200

On Sun 22-09-19 20:30:51, Tetsuo Handa wrote:
> On 2019/09/22 15:20, Michal Hocko wrote:
> > On Sun 22-09-19 08:47:31, Tetsuo Handa wrote:
> >> On 2019/09/22 5:30, Michal Hocko wrote:
> >>> On Fri 20-09-19 17:10:42, Andrew Morton wrote:
> >>>> On Sat, 20 Jul 2019 20:29:23 +0900 Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>>>>
> >>>>>>> ) under RCU and this patch is one of them (except that we can't remove
> >>>>>>> printk() for dump_tasks() case).
> >>>>>>
> >>>>>> No, this one adds a complexity for something that is not clearly a huge
> >>>>>> win or the win is not explained properly.
> >>>>>>
> >>>>>
> >>>>> The win is already explained properly by the past commits. Avoiding RCU stalls
> >>>>> (even without slow consoles) is a clear win. The duration of RCU stall avoided
> >>>>> by this patch is roughly the same with commit b2b469939e934587.
> >>>>>
> >>>>> We haven't succeeded making printk() asynchronous (and potentially we won't
> >>>>> succeed making printk() asynchronous because we need synchronous printk()
> >>>>> when something critical is undergoing outside of out_of_memory()). Thus,
> >>>>> bringing printk() to outside of RCU section is a clear win we can make for now.
> >>>>
> >>>> It's actually not a complex patch and moving all that printing outside
> >>>> the rcu section makes sense.  So I'll sit on the patch for a few more
> >>>> days but am inclined to send it upstream.
> >>>
> >>> Look, I am quite tired of arguing about this and other changes following
> >>> the similar pattern. In short a problematic code is shuffled around and
> >>> pretend to solve some problem. In this particular case it is a RCU stall
> >>> which in itself is not a fatal condition. Sure it sucks and the primary
> >>> reason is that printk can take way too long. This is something that is
> >>> currently a WIP to be address. What is more important though there is no
> >>> sign of any _real world_ workload that would require a quick workaround
> >>> to justify a hacky stop gap solution.
> >>>
> >>> So again, why do we want to add more code for something which is not
> >>> clear to be a real life problem and that will add a maintenance burden
> >>> for future?
> >>>
> >>
> >> Enqueueing zillion printk() lines from dump_tasks() will overflow printk
> >> buffer (i.e. leads to lost messages) if OOM killer messages were printed
> >> asynchronously. I don't think that making printk() asynchronous will solve
> >> this problem. I repeat again; there is no better solution than "printk()
> >> users are careful not to exhaust the printk buffer". This patch is the
> >> first step towards avoiding thoughtless printk().
> > 
> > Irrelevant because this patch doesn't reduce the amount of output.
> 
> This patch is just a temporary change before applying
> https://lkml.kernel.org/r/7de2310d-afbd-e616-e83a-d75103b986c6@xxxxxxxxxxxxxxxxxxx and
> https://lkml.kernel.org/r/57be50b2-a97a-e559-e4bd-10d923895f83@xxxxxxxxxxxxxxxxxxx .
>
> Show your solution by patch instead of ignoring or nacking.

I simply suggest the most trivial patch which doesn't change any single
line of code.

This and the two discussion referenced by you simply confirm that a)
you didn't bother to think your change through for other potential
corner cases and b) add even more code in order to behave semi-sane.

> >> Delay from dump_tasks() not only affects a thread holding oom_lock but also
> >> other threads which are directly doing concurrent allocation requests or
> >> indirectly waiting for the thread holding oom_lock. Your "it is a RCU stall
> >> which in itself is not a fatal condition" is underestimating the _real world_
> >> problems (e.g. "delay can trigger watchdog timeout and cause the system to
> >> reboot even if the administrator does not want the system to reboot").
> > 
> > Please back your claims by real world examples.
> > 
> 
> People have to use /proc/sys/vm/oom_dump_tasks == 0 (and give up obtaining some
> clue) because they worry stalls caused by /proc/sys/vm/oom_dump_tasks != 0 while
> they have to use /proc/sys/vm/panic_on_oom == 0 because they don't want the down
> time caused by rebooting. And such situation cannot be solved unless we solve stalls
> caused by /proc/sys/vm/oom_dump_tasks != 0. I'm working at a support center and
> I have to be able to figure out the system's state, but I have neither environment
> to run real world workloads nor control of customer's environments to enforce
> /proc/sys/vm/oom_dump_tasks != 0.
> 
> In short, your "real world" requirement is a catch-22 problem.

I am pretty sure this would be less of a catch-22 problem if you had
more actual arguments at hands rather than constant hand waving. I have
told you many times and I will repeat one more time, and hopefully won't
have to again, even if there are issues in the code we always have to
weigh cost vs. benefits. If no real workloads are hitting these problems
while the fix in question is non-trivial, adds a maintenance burden or
even worse undermine the functionality (and dump_tasks printed at an
arbitrary time after the actual oom while you keep references to
task_structs really could be perceived that way) then a patch is simply
not worth it.

There are exceptions to that of course. If a more complex solution would
lead to a more robust code or functionality that other parts of the
kernel could benefit then this would be certainly an argument to weigh
in as well. E.g. improving tasks iteration to release rcu lock to yield
etc, improving printk etc.

I completely see how stress testing corner cases is useful and how it
might help the code in general but solely focusing on this testing is a
free one way ticket to unmaintainable mess.

This is my last email in this thread.
-- 
Michal Hocko
SUSE Labs