On Wed, Apr 10, 2024 at 5:07 AM <liu.yec@xxxxxxx> wrote: > > From: LiuYe <liu.yeC@xxxxxxx> > > Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will > attempt to use schedule_work() to provoke a keyboard reset when > transitioning out of the debugger and back to normal operation. > This can cause deadlock because schedule_work() is not NMI-safe. > > The stack trace below shows an example of the problem. In this > case the master cpu is not running from NMI but it has parked > the slave CPUs using an NMI and the parked CPUs is holding > spinlocks needed by schedule_work(). > > Example: > BUG: spinlock lockup suspected on CPU#0. owner_cpu: 1 > CPU1: Call Trace: > __schedule > schedule > schedule_hrtimeout_range_clock > mutex_unlock > ep_scan_ready_list > schedule_hrtimeout_range > ep_poll > wake_up_q > SyS_epoll_wait > entry_SYSCALL_64_fastpath > > CPU0: Call Trace: > dump_stack > spin_dump > do_raw_spin_lock > _raw_spin_lock > try_to_wake_up > wake_up_process > insert_work > __queue_work > queue_work_on > kgdboc_post_exp_handler > kgdb_cpu_enter > kgdb_handle_exception > __kgdb_notify > kgdb_notify > notifier_call_chain > notify_die > do_int3 > int3 > > We fix the problem by using irq_work to call schedule_work() > instead of calling it directly. This is because we cannot > resynchronize the keyboard state from the hardirq context > provided by irq_work. This must be done from the task context > in order to call the input subsystem. > > Therefore, we have to defer the work twice. First, safely > switch from the debug trap context (similar to NMI) to the > hardirq, and then switch from the hardirq to the system work queue. ... > Signed-off-by: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> > Signed-off-by: Andy Shevchenko <andy.shevchenko@xxxxxxxxx> > V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, Acked-by of Daniel Thompson Huh?! -- With Best Regards, Andy Shevchenko