Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 2018-12-04 23:49:36, Feng Tang wrote:
> + Sasha 
> 
> Thanks Petr and Sergey for the reviews.
> 
> 
> On Tue, Dec 04, 2018 at 11:20:33AM +0100, Petr Mladek wrote:
> > On Mon 2018-12-03 23:15:31, Andrew Morton wrote:
> > > 
> > > ------------------------------------------------------
> > > From: Feng Tang <feng.tang@xxxxxxxxx>
> > > Subject: panic: Avoid the extra noise dmesg
> > > 
> > > When kernel panic happens, it will first print the panic call stack,
> > > then the ending msg like:
> > > 
> > > [   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
> > > [   35.749975] ------------[ cut here ]------------
> > > 
> > > The above message are very useful for debugging.
> > > 
> > > But if system is configured to not reboot on panic, say the "panic_timeout"
> > > parameter equals 0, it will likely print out many noisy message like
> > > WARN() call stack for each and every CPU except the panic one, messages
> > > like below:
> > > 
> > > 	WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
> > > 	Call Trace:
> > > 	<IRQ>
> > > 	try_to_wake_up
> > > 	default_wake_function
> > > 	autoremove_wake_function
> > > 	__wake_up_common
> > > 	__wake_up_common_lock
> > > 	__wake_up
> > > 	wake_up_klogd_work_func
> > > 	irq_work_run_list
> > > 	irq_work_tick
> > > 	update_process_times
> > > 	tick_sched_timer
> > > 	__hrtimer_run_queues
> > > 	hrtimer_interrupt
> > > 	smp_apic_timer_interrupt
> > > 	apic_timer_interrupt
> > 
> > I guess that it is a warning about migrating tasks to an offline CPU.
>  
> My v1 patch was trying to add some hacky code into architecture code to
> address several WARN()s directly, but that turns out to be very hacky and
> involve much code for many archs.
> 
> > > For people working in console mode, the screen will first show the panic
> > > call stack, but immediately overridded by these noisy extra messages, which
> > > makes debugging much more difficult, as the original context gets lost on
> > > screen.
> > > 
> > > Also these noisy messages will confuse some users, as I have seen many bug
> > > reporters posted the noisy message into bugzilla, instead of the real panic
> > > call stack and context.
> > > 
> > > Removing the "local_irq_enable" will avoid the noisy message.
> > > 
> > > The justification for the removing is: when code runs to this point, it
> > > means user has chosed to not reboot, or do any special handling by using
> > > the panic notifier method, no much point in re-enabling the interrupt.
> > > 
> > > Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@xxxxxxxxx
> > > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx>
> > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > > Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> > > Cc: Borislav Petkov <bp@xxxxxxx>
> > > Cc: Petr Mladek <pmladek@xxxxxxxx>
> > > Cc: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx>
> > > Cc: <stable@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > ---
> > > 
> > > 
> > > --- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg
> > > +++ a/kernel/panic.c
> > > @@ -322,7 +322,6 @@ void panic(const char *fmt, ...)
> > >  	}
> > >  #endif
> > >  	pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
> > > -	local_irq_enable();
> > >  	for (i = 0; ; i += PANIC_TIMER_STEP) {
> > >  		touch_softlockup_watchdog();
> > >  		if (i >= i_next) {
> > 
> > Hmm, this calls panic_blink(). It seems that it depends on workqueues
> > and the scheduler:
> > 
> >   + led_panic_blink()
> >     + led_trigger_event()
> >       + led_set_brightness()
> >         + schedule_work(set_brightness_work)
> > 
> > I guess that blinking might be important in some situations and
> > on some devices. On the other hand, we are interested only into
> > blinking from this point on.
> > 
> > The easiest solution seems to be to make a noop from printk().
> > For example, we could add a global flag:
> > 
> > int panic_blinking;
> > 
> > and add the following into vprintk_func()
> > 
> > 	/*
> > 	 * Do not push away real panic() message by warnings from led
> > 	 * blinking code.
> > 	 */
> > 	if (panic_blinking)
> > 		return 0;
> > 
> > How does that sound?
> 
> This should be able to achieve the same goal. 
> 
> One thing I can think of is what mentioned by Sergey that some sysrq
> handler may want to print out something, but it should mostly be
> covered by 2 other panic debug print patches, which will print out
> task/mem/timer/lock/ftrace info runtime on demand.

I think that we could simply clear panic_blinking from
__handle_sysrq(). The user will still be able to capture the screen
before touching the keyboard. But it will keep the things simple.

I hope that we did not miss anything else. Anyway, the approach with
making printk a nop still looks like the best maintainable solution
to me.

Best Regards,
Petr



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux