Wed, Apr 03, 2024 at 02:11:09PM +0800, liu.yec@xxxxxxx kirjoitti: > From: LiuYe <liu.yeC@xxxxxxx> > > Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will > attempt to use schedule_work() to provoke a keyboard reset when > transitioning out of the debugger and back to normal operation. > This can cause deadlock because schedule_work() is not NMI-safe. > > The stack trace below shows an example of the problem. In this > case the master cpu is not running from NMI but it has parked > the slave CPUs using an NMI and the parked CPUs is holding > spinlocks needed by schedule_work(). > example: > BUG: spinlock lockup suspected on CPU#0, namex/10450 > lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1 > ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000 > ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20 > Call Trace: > [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0 > [<ffffffff8147a7fc>] ? schedule+0x3c/0x90 > [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120 > [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10 > [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0 > [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20 > [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0 > [<ffffffff8108c540>] ? wake_up_q+0x70/0x70 > [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0 > [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75 > CPU: 0 PID: 10450 Comm: namex Tainted: G O 4.4.65 #1 > Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019 > 0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000 > ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980 > 000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0 > Call Trace: > <#DB> [<ffffffff8124e883>] dump_stack+0x85/0xc2 > [<ffffffff810a7f7f>] spin_dump+0x7f/0x100 > [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150 > [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20 > [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0 > [<ffffffff8108c4c5>] wake_up_process+0x15/0x20 > [<ffffffff8107b371>] insert_work+0x81/0xc0 > [<ffffffff8107b4e5>] __queue_work+0x135/0x390 > [<ffffffff8107b786>] queue_work_on+0x46/0x90 > [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70 > [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610 > [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0 > [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0 > [<ffffffff81054eb5>] kgdb_notify+0x35/0x70 > [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70 > [<ffffffff8108304d>] notify_die+0x3d/0x50 > [<ffffffff81017219>] do_int3+0x89/0x120 > [<ffffffff81480fb4>] int3+0x44/0x80 Ouch. Please, read this https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages and modify the commit message accordingly. > We fix the problem by using irq_work to call schedule_work() > instead of calling it directly. This is because we cannot > resynchronize the keyboard state from the hardirq context > provided by irq_work. This must be done from the task context > in order to call the input subsystem. > > Therefore, we have to defer the work twice. First, safely > switch from the debug trap context (similar to NMI) to the > hardirq, and then switch from the hardirq to the system work queue. > Signed-off-by: LiuYe <liu.yeC@xxxxxxx> > Co-authored-by: Daniel Thompson <daniel.thompson@xxxxxxxxxx> Correct tag is Co-developed-by, btw it's written in the same document the link to which I provided a few lines above. ... > --- a/drivers/tty/serial/kgdboc.c > +++ b/drivers/tty/serial/kgdboc.c > @@ -22,6 +22,7 @@ > #include <linux/module.h> > #include <linux/platform_device.h> > #include <linux/serial_core.h> > +#include <linux/irq_work.h> Please, keep it ordered (with visible context this should go at least before module.h). -- With Best Regards, Andy Shevchenko