Hi Petr, On Tue, Dec 04, 2018 at 05:01:52PM +0100, Petr Mladek wrote: > On Tue 2018-12-04 23:49:36, Feng Tang wrote: > > + Sasha > > > > Thanks Petr and Sergey for the reviews. > > > > > > On Tue, Dec 04, 2018 at 11:20:33AM +0100, Petr Mladek wrote: > > > On Mon 2018-12-03 23:15:31, Andrew Morton wrote: > > > > > > > > ------------------------------------------------------ > > > > From: Feng Tang <feng.tang@xxxxxxxxx> > > > > Subject: panic: Avoid the extra noise dmesg > > > > > > > > When kernel panic happens, it will first print the panic call stack, > > > > then the ending msg like: > > > > > > > > [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception > > > > [ 35.749975] ------------[ cut here ]------------ > > > > > > > > The above message are very useful for debugging. > > > > > > > > But if system is configured to not reboot on panic, say the "panic_timeout" > > > > parameter equals 0, it will likely print out many noisy message like > > > > WARN() call stack for each and every CPU except the panic one, messages > > > > like below: > > > > > > > > WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190 > > > > Call Trace: > > > > <IRQ> > > > > try_to_wake_up > > > > default_wake_function > > > > autoremove_wake_function > > > > __wake_up_common > > > > __wake_up_common_lock > > > > __wake_up > > > > wake_up_klogd_work_func > > > > irq_work_run_list > > > > irq_work_tick > > > > update_process_times > > > > tick_sched_timer > > > > __hrtimer_run_queues > > > > hrtimer_interrupt > > > > smp_apic_timer_interrupt > > > > apic_timer_interrupt > > > > > > I guess that it is a warning about migrating tasks to an offline CPU. > > > > My v1 patch was trying to add some hacky code into architecture code to > > address several WARN()s directly, but that turns out to be very hacky and > > involve much code for many archs. > > > > > > For people working in console mode, the screen will first show the panic > > > > call stack, but immediately overridded by these noisy extra messages, which > > > > makes debugging much more difficult, as the original context gets lost on > > > > screen. > > > > > > > > Also these noisy messages will confuse some users, as I have seen many bug > > > > reporters posted the noisy message into bugzilla, instead of the real panic > > > > call stack and context. > > > > > > > > Removing the "local_irq_enable" will avoid the noisy message. > > > > > > > > The justification for the removing is: when code runs to this point, it > > > > means user has chosed to not reboot, or do any special handling by using > > > > the panic notifier method, no much point in re-enabling the interrupt. > > > > > > > > Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@xxxxxxxxx > > > > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx> > > > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > > > Cc: Kees Cook <keescook@xxxxxxxxxxxx> > > > > Cc: Borislav Petkov <bp@xxxxxxx> > > > > Cc: Petr Mladek <pmladek@xxxxxxxx> > > > > Cc: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx> > > > > Cc: <stable@xxxxxxxxxxxxxxx> > > > > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > > > --- > > > > > > > > > > > > --- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg > > > > +++ a/kernel/panic.c > > > > @@ -322,7 +322,6 @@ void panic(const char *fmt, ...) > > > > } > > > > #endif > > > > pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); > > > > - local_irq_enable(); > > > > for (i = 0; ; i += PANIC_TIMER_STEP) { > > > > touch_softlockup_watchdog(); > > > > if (i >= i_next) { > > > > > > Hmm, this calls panic_blink(). It seems that it depends on workqueues > > > and the scheduler: > > > > > > + led_panic_blink() > > > + led_trigger_event() > > > + led_set_brightness() > > > + schedule_work(set_brightness_work) > > > > > > I guess that blinking might be important in some situations and > > > on some devices. On the other hand, we are interested only into > > > blinking from this point on. > > > > > > The easiest solution seems to be to make a noop from printk(). > > > For example, we could add a global flag: > > > > > > int panic_blinking; > > > > > > and add the following into vprintk_func() > > > > > > /* > > > * Do not push away real panic() message by warnings from led > > > * blinking code. > > > */ > > > if (panic_blinking) > > > return 0; > > > > > > How does that sound? > > > > This should be able to achieve the same goal. > > > > One thing I can think of is what mentioned by Sergey that some sysrq > > handler may want to print out something, but it should mostly be > > covered by 2 other panic debug print patches, which will print out > > task/mem/timer/lock/ftrace info runtime on demand. > > I think that we could simply clear panic_blinking from > __handle_sysrq(). The user will still be able to capture the screen > before touching the keyboard. But it will keep the things simple. > > I hope that we did not miss anything else. Anyway, the approach with > making printk a nop still looks like the best maintainable solution > to me. I will setup a platform which can handle sysrq request and try your suggestion. thanks, - Feng