Re: [PATCH] printk: Ratelimit messages printed by console drivers

Sergey Senozhatsky <sergey.senozhatsky.work@xxxxxxxxx> · Mon, 16 Apr 2018 13:25:53 +0900

On (04/16/18 10:47), Sergey Senozhatsky wrote:
> On (04/14/18 11:35), Sergey Senozhatsky wrote:
> > On (04/13/18 10:12), Steven Rostedt wrote:
> > > 
> > > > The interval is set to one hour. It is rather arbitrary selected time.
> > > > It is supposed to be a compromise between never print these messages,
> > > > do not lockup the machine, do not fill the entire buffer too quickly,
> > > > and get information if something changes over time.
> > > 
> > > 
> > > I think an hour is incredibly long. We only allow 100 lines per hour for
> > > printks happening inside another printk?
> > > 
> > > I think 5 minutes (at most) would probably be plenty. One minute may be
> > > good enough.
> > 
> > Besides 100 lines is absolutely not enough for any real lockdep splat.
> > My call would be - up to 1000 lines in a 1 minute interval.
> 
> Well, if we want to basically turn printk_safe() into printk_safe_ratelimited().
> I'm not so sure about it.
> 
> Besides the patch also rate limits printk_nmi->logbuf - the logbuf
> PRINTK_NMI_DEFERRED_CONTEXT_MASK bypass, which is way too important
> to rate limit it - for no reason.
> 
> Dunno, can we keep printk_safe() the way it is and introduce a new
> printk_safe_ratelimited() specifically for call_console_drivers()?
> 
> Lockdep splat is a one time event, if we lose half of it - we, most
> like, lose the entire report. And call_console_drivers() is not the
> one and only source of warnings/errors/etc. So if we turn printk_safe
> into printk_safe_ratelimited() [not sure we want to do it] for all
> then I want restrictions to be as low as possible, IOW to log_store()
> as many lines as possible.

One more thing,
I'd really prefer to rate limit the function which flushes per-CPU
printk_safe buffers; not the function that appends new messages to
the per-CPU printk_safe buffers. There is a significant difference.

printk_safe does not help us when we are dealing with any external
locks - and call_console_drivers() is precisely that type of case.
The very next thing to happen after lockdep splat, or spin_lock
debugging report, etc. can be an actual deadlock->panic(). Thus I
want to have the entire report in per-CPU buffer [if possible],
so we can flush_on_panic() per-CPU buffers, or at least move the
data to the logbuf and make it accessible in vmcore. If we rate
limit the function that appends data to the per-CPU buffer then we
may simply suppress [rate limit] the report, so there will be
nothing to flush_on_panic().

	-ss