Re: [PATCH] mm,page_alloc: Serialize warn_alloc() if schedulable.

Michal Hocko <mhocko@xxxxxxxx> · Wed, 12 Jul 2017 14:41:45 +0200

On Wed 12-07-17 21:23:05, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Wed 12-07-17 07:06:11, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > On Tue 11-07-17 22:10:36, Tetsuo Handa wrote:
> > > > > Michal Hocko wrote:
> > [...]
> > > > > > warn_alloc is just yet-another-user of printk. We might have many
> > > > > > others...
> > > > >
> > > > > warn_alloc() is different from other users of printk() that printk() is called
> > > > > as long as oom_lock is already held by somebody else processing console_unlock().
> > > >
> > > > So what exactly prevents any other caller of printk interfering while
> > > > the oom is ongoing?
> > >
> > > Other callers of printk() are not doing silly things like "while(1) printk();".
> >
> > They can still print a lot. There have been reports of one printk source
> > pushing an unrelated context to print way too much.
> 
> Which source is that?
> 
> Legitimate printk() users might do
> 
>   for (i = 0; i < 1000; i++)
>     printk();
> 
> but they do not do
> 
>   while (1)
>     for (i = 0; i < 1000; i++)
>       printk();
> 
> .
> 
> >
> > > They don't call printk() until something completes (e.g. some operation returned
> > > an error code) or they do throttling. Only watchdog calls printk() without waiting
> > > for something to complete (because watchdog is there in order to warn that something
> > > might be wrong). But watchdog is calling printk() carefully not to cause flooding
> > > (e.g. khungtaskd sleeps enough) and not to cause lockups (e.g. khungtaskd calls
> > > rcu_lock_break()).
> >
> > Look at hard/soft lockup detector and how it can cause flood of printks.
> 
> Lockup detector is legitimate because it is there to warn that somebody is
> continuously consuming CPU time. Lockup detector might do

Sigh. What I've tried to convey is that the lockup detector can print _a
lot_ (just consider a large machine with hundreds of CPUs and trying to
dump stack trace on each of them....) and that might mimic a herd of
printks from allocation stalls...
[...]
> > warn_alloc prints a single line + dump_stack for each stalling allocation and
> > show_mem once per second. That doesn't sound overly crazy to me.
> > Sure we can have many stalling tasks under certain conditions (most of
> > them quite unrealistic) and then we can print a lot. I do not see an
> > easy way out of it without losing information about stalls and I guess
> > we want to know about them otherwise we will have much harder time to
> > debug stalls.
> 
> Printing just one line per every second can lead to lockup, for
> the condition to escape the "for (;;)" loop in console_unlock() is
> 
>                 if (console_seq == log_next_seq)
>                         break;

Then something is really broken in that condition, don't you think?
Peter has already mentioned that offloading to a different context seems
like the way to go here.

> when cond_resched() in that loop slept for more than one second due to
> SCHED_IDLE priority.
> 
> Currently preempt_disable()/preempt_enable_no_resched() (or equivalent)
> is the only available countermeasure for minimizing interference like
> 
>     for (i = 0; i < 1000; i++)
>       printk();
> 
> . If prink() allows per printk context (shown below) flag which allows printk()
> users to force printk() not to try to print immediately (i.e. declare that
> use deferred printing (maybe offloaded to the printk-kthread)), lockups by
> cond_resched() from console_unlock() from printk() from out_of_memory() will be
> avoided.

As I've said earlier, if there is no other way to make printk work without all
these nasty side effected then I would be OK to add a printk context
specific calls into the oom killer.

Removing the rest because this is again getting largely tangent. The
primary problem you are seeing is that we stumble over printk here.
Unless I can see a sound argument this is not the case it doesn't make
any sense to discuss allocator changes.

[...]
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>