Re: [PATCH v4] printk: Add console owner and waiter logic to loadbalance console writes

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Thu, 9 Nov 2017 21:07:15 +0900

Michal Hocko wrote:
> On Thu 09-11-17 20:03:30, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Thu 09-11-17 19:22:58, Tetsuo Handa wrote:
> > > > Michal Hocko wrote:
> > > > > Hi,
> > > > > assuming that this passes warn stall torturing by Tetsuo, do you think
> > > > > we can drop http://lkml.kernel.org/r/1509017339-4802-1-git-send-email-penguin-kernel@xxxxxxxxxxxxxxxxxxx
> > > > > from the mmotm tree?
> > > > 
> > > > I don't think so.
> > > > 
> > > > The rule that "do not try to printk() faster than the kernel can write to
> > > > consoles" will remain no matter how printk() changes. Unless asynchronous
> > > > approach like https://lwn.net/Articles/723447/ is used, I think we can't
> > > > obtain useful information.
> > > 
> > > Does that mean that the patch doesn't pass your test?
> > > 
> > 
> > Test is irrelevant. See the changelog.
> > 
> >   Synchronous approach is prone to unexpected results (e.g. too late [1], too
> >   frequent [2], overlooked [3]). As far as I know, warn_alloc() never helped
> >   with providing information other than "something is going wrong".
> >   I want to consider asynchronous approach which can obtain information
> >   during stalls with possibly relevant threads (e.g. the owner of oom_lock
> >   and kswapd-like threads) and serve as a trigger for actions (e.g. turn
> >   on/off tracepoints, ask libvirt daemon to take a memory dump of stalling
> >   KVM guest for diagnostic purpose).
> > 
> >   [1] https://bugzilla.kernel.org/show_bug.cgi?id=192981
> >   [2] http://lkml.kernel.org/r/CAM_iQpWuPVGc2ky8M-9yukECtS+zKjiDasNymX7rMcBjBFyM_A@xxxxxxxxxxxxxx
> >   [3] commit db73ee0d46379922 ("mm, vmscan: do not loop on too_many_isolated for ever")
> 
> So you want to keep the warning out of the kernel even though the
> problems you are seeing are gone just to allow for an async approach
> nobody is very fond of? That is a very dubious approach.

You are assuming that there are no more bugs which will be caught by
an async approach. That is seriously wrong. [3] is just an example.
http://lkml.kernel.org/r/CABXGCsOzaorL0wKZFYRFKR7RSnUL+7=vspE36sFTENoimsJGSw@xxxxxxxxxxxxxx
is an example where async approach will help. For example, turn various tracepoints on
if stall lasted for 5 seconds and then turn them off when stall disappeared.
It is very unfortunate that we still do not have such trigger.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>