>From: LiuYe <liu.yeC@xxxxxxx> > >Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will >attempt to use schedule_work() to provoke a keyboard reset when >transitioning out of the debugger and back to normal operation. >This can cause deadlock because schedule_work() is not NMI-safe. > >The stack trace below shows an example of the problem. In this >case the master cpu is not running from NMI but it has parked >the slave CPUs using an NMI and the parked CPUs is holding >spinlocks needed by schedule_work(). > >Example: >BUG: spinlock lockup suspected on CPU#0. owner_cpu: 1 >CPU1: Call Trace: >__schedule >schedule >schedule_hrtimeout_range_clock >mutex_unlock >ep_scan_ready_list >schedule_hrtimeout_range >ep_poll >wake_up_q >SyS_epoll_wait >entry_SYSCALL_64_fastpath > >CPU0: Call Trace: >dump_stack >spin_dump >do_raw_spin_lock >_raw_spin_lock >try_to_wake_up >wake_up_process >insert_work >__queue_work >queue_work_on >kgdboc_post_exp_handler >kgdb_cpu_enter >kgdb_handle_exception >__kgdb_notify >kgdb_notify >notifier_call_chain >notify_die >do_int3 >int3 > >We fix the problem by using irq_work to call schedule_work() >instead of calling it directly. This is because we cannot >resynchronize the keyboard state from the hardirq context >provided by irq_work. This must be done from the task context >in order to call the input subsystem. > >Therefore, we have to defer the work twice. First, safely >switch from the debug trap context (similar to NMI) to the >hardirq, and then switch from the hardirq to the system work queue. > >Signed-off-by: LiuYe <liu.yeC@xxxxxxx> >Co-developed-by: Daniel Thompson <daniel.thompson@xxxxxxxxxx> >Signed-off-by: Daniel Thompson <daniel.thompson@xxxxxxxxxx> > >--- >V10 -> V11: Revert to V9 >V9 -> V10 : Add Signed-off-by of Greg KH and Andy Shevchenko, Acked-by of Daniel Thompson >V8 -> V9: Modify call trace format and move irq_work.h before module.h >V7 -> V8: Update the description information and comments in the code. > : Submit this patch based on version linux-6.9-rc2. >V6 -> V7: Add comments in the code. >V5 -> V6: Replace with a more professional and accurate answer. >V4 -> V5: Answer why schedule another work in the irq_work and not do the job directly. >V3 -> V4: Add changelogs >V2 -> V3: Add description information >V1 -> V2: using irq_work to solve this properly. >--- >--- > drivers/tty/serial/kgdboc.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > >diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c >index 7ce7bb164..32410fec7 100644 >--- a/drivers/tty/serial/kgdboc.c >+++ b/drivers/tty/serial/kgdboc.c >@@ -19,6 +19,7 @@ > #include <linux/console.h> > #include <linux/vt_kern.h> > #include <linux/input.h> >+#include <linux/irq_work.h> > #include <linux/module.h> > #include <linux/platform_device.h> > #include <linux/serial_core.h> >@@ -82,6 +83,19 @@ static struct input_handler kgdboc_reset_handler = { > > static DEFINE_MUTEX(kgdboc_reset_mutex); > >+/* >+ * This code ensures that the keyboard state, which is changed during kdb >+ * execution, is resynchronized when we leave the debug trap. The resync >+ * logic calls into the input subsystem to force a reset. The calls into >+ * the input subsystem must be executed from normal task context. >+ * >+ * We need to trigger the resync from the debug trap, which executes in an >+ * NMI (or similar) context. To make it safe to call into the input >+ * subsystem we end up having use two deferred execution techniques. >+ * Firstly, we use irq_work, which is NMI-safe, to provoke a callback from >+ * hardirq context. Then, from the hardirq callback we use the system >+ * workqueue to provoke the callback that actually performs the resync. >+ */ > static void kgdboc_restore_input_helper(struct work_struct *dummy) > { > /* >@@ -99,10 +113,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy) > > static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper); > >+static void kgdboc_queue_restore_input_helper(struct irq_work *unused) >+{ >+ schedule_work(&kgdboc_restore_input_work); >+} >+ >+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper); >+ > static void kgdboc_restore_input(void) > { > if (likely(system_state == SYSTEM_RUNNING)) >- schedule_work(&kgdboc_restore_input_work); >+ irq_work_queue(&kgdboc_restore_input_irq_work); > } > > static int kgdboc_register_kbd(char **cptr) >@@ -133,6 +154,7 @@ static void kgdboc_unregister_kbd(void) > i--; > } > } >+ irq_work_sync(&kgdboc_restore_input_irq_work); > flush_work(&kgdboc_restore_input_work); > } > #else /* ! CONFIG_KDB_KEYBOARD */ >-- >2.25.1 What is the current status of PATCH V11? Are there any additional modifications needed?