On Fri, Mar 22, 2024 at 07:50:54AM +0000, Liuye wrote: > >On 21. 03. 24, 12:50, liu.yec@xxxxxxx wrote: > >> From: LiuYe <liu.yeC@xxxxxxx> > >> > >> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will attempt > >> to use schedule_work() to provoke a keyboard reset when transitioning > >> out of the debugger and back to normal operation. > >> This can cause deadlock because schedule_work() is not NMI-safe. > >> > >> The stack trace below shows an example of the problem. In this case > >> the master cpu is not running from NMI but it has parked the slave > >> CPUs using an NMI and the parked CPUs is holding spinlocks needed by > >> schedule_work(). > > > > I am missing here an explanation (perhaps because I cannot find any > > docs for irq_work) why irq_work works in this case. > > Just need to postpone schedule_work to the slave CPU exiting the NMI > context, and there will be no deadlock problem. irq_work will only > respond to handle schedule_work after master cpu exiting the current > interrupt context. When the master CPU exits the interrupt context, > other CPUs will naturally exit the NMI context, so there will be no > deadlock. > > > And why you need to schedule another work in the irq_work and not do > > the job directly. > > In the function kgdboc_restore_input_helper , use mutex_lock for > protection. It is the call to input_register_handler() that forces us not to do the work from irq_work's hardirq callback. It is true that there are mutexes in kgdboc_restore_input_helper() but if they were the only problem we could change the locking strategy. > The mutex lock cannot be used in interrupt context. Guess > that the process needs to run in the context of the process. > Therefore, call schedule_work in irq_work. Keep the original flow > unchanged. You should answer these questions by posting a v5 with the explanation in the patch description (otherwise the explanation of how the fix works doesn't end up in the changelog). Daniel.