Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+ Sasha 

Thanks Petr and Sergey for the reviews.


On Tue, Dec 04, 2018 at 11:20:33AM +0100, Petr Mladek wrote:
> On Mon 2018-12-03 23:15:31, Andrew Morton wrote:
> > 
> > ------------------------------------------------------
> > From: Feng Tang <feng.tang@xxxxxxxxx>
> > Subject: panic: Avoid the extra noise dmesg
> > 
> > When kernel panic happens, it will first print the panic call stack,
> > then the ending msg like:
> > 
> > [   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
> > [   35.749975] ------------[ cut here ]------------
> > 
> > The above message are very useful for debugging.
> > 
> > But if system is configured to not reboot on panic, say the "panic_timeout"
> > parameter equals 0, it will likely print out many noisy message like
> > WARN() call stack for each and every CPU except the panic one, messages
> > like below:
> > 
> > 	WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
> > 	Call Trace:
> > 	<IRQ>
> > 	try_to_wake_up
> > 	default_wake_function
> > 	autoremove_wake_function
> > 	__wake_up_common
> > 	__wake_up_common_lock
> > 	__wake_up
> > 	wake_up_klogd_work_func
> > 	irq_work_run_list
> > 	irq_work_tick
> > 	update_process_times
> > 	tick_sched_timer
> > 	__hrtimer_run_queues
> > 	hrtimer_interrupt
> > 	smp_apic_timer_interrupt
> > 	apic_timer_interrupt
> 
> I guess that it is a warning about migrating tasks to an offline CPU.
 
My v1 patch was trying to add some hacky code into architecture code to
address several WARN()s directly, but that turns out to be very hacky and
involve much code for many archs.

> > For people working in console mode, the screen will first show the panic
> > call stack, but immediately overridded by these noisy extra messages, which
> > makes debugging much more difficult, as the original context gets lost on
> > screen.
> > 
> > Also these noisy messages will confuse some users, as I have seen many bug
> > reporters posted the noisy message into bugzilla, instead of the real panic
> > call stack and context.
> > 
> > Removing the "local_irq_enable" will avoid the noisy message.
> > 
> > The justification for the removing is: when code runs to this point, it
> > means user has chosed to not reboot, or do any special handling by using
> > the panic notifier method, no much point in re-enabling the interrupt.
> > 
> > Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@xxxxxxxxx
> > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx>
> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> > Cc: Borislav Petkov <bp@xxxxxxx>
> > Cc: Petr Mladek <pmladek@xxxxxxxx>
> > Cc: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx>
> > Cc: <stable@xxxxxxxxxxxxxxx>
> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > ---
> > 
> > 
> > --- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg
> > +++ a/kernel/panic.c
> > @@ -322,7 +322,6 @@ void panic(const char *fmt, ...)
> >  	}
> >  #endif
> >  	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
> > -	local_irq_enable();
> >  	for (i = 0; ; i += PANIC_TIMER_STEP) {
> >  		touch_softlockup_watchdog();
> >  		if (i >= i_next) {
> 
> Hmm, this calls panic_blink(). It seems that it depends on workqueues
> and the scheduler:
> 
>   + led_panic_blink()
>     + led_trigger_event()
>       + led_set_brightness()
>         + schedule_work(set_brightness_work)
> 
> I guess that blinking might be important in some situations and
> on some devices. On the other hand, we are interested only into
> blinking from this point on.
> 
> The easiest solution seems to be to make a noop from printk().
> For example, we could add a global flag:
> 
> int panic_blinking;
> 
> and add the following into vprintk_func()
> 
> 	/*
> 	 * Do not push away real panic() message by warnings from led
> 	 * blinking code.
> 	 */
> 	if (panic_blinking)
> 		return 0;
> 
> How does that sound?

This should be able to achieve the same goal. 

One thing I can think of is what mentioned by Sergey that some sysrq
handler may want to print out something, but it should mostly be
covered by 2 other panic debug print patches, which will print out
task/mem/timer/lock/ftrace info runtime on demand.

Thanks,
Feng






[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux