Re: 答复: [PATCH V4] kdb: Fix the deadlock issue in KDB debugging.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 22, 2024 at 07:50:54AM +0000, Liuye wrote:
> >On 21. 03. 24, 12:50, liu.yec@xxxxxxx wrote:
> >> From: LiuYe <liu.yeC@xxxxxxx>
> >>
> >> Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will attempt
> >> to use schedule_work() to provoke a keyboard reset when transitioning
> >> out of the debugger and back to normal operation.
> >> This can cause deadlock because schedule_work() is not NMI-safe.
> >>
> >> The stack trace below shows an example of the problem. In this case
> >> the master cpu is not running from NMI but it has parked the slave
> >> CPUs using an NMI and the parked CPUs is holding spinlocks needed by
> >> schedule_work().
> >
> > I am missing here an explanation (perhaps because I cannot find any
> > docs for irq_work) why irq_work works in this case.
>
> Just need to postpone schedule_work to the slave CPU exiting the NMI
> context, and there will be no deadlock problem.  irq_work will only
> respond to handle schedule_work after master cpu exiting the current
> interrupt context.  When the master CPU exits the interrupt context,
> other CPUs will naturally exit the NMI context, so there will be no
> deadlock.
>
> > And why you need to schedule another work in the irq_work and not do
> > the job directly.
>
> In the function kgdboc_restore_input_helper , use mutex_lock for
> protection.

It is the call to input_register_handler() that forces us not to
do the work from irq_work's hardirq callback.

It is true that there are mutexes in kgdboc_restore_input_helper()
but if they were the only problem we could change the locking
strategy.


> The mutex lock cannot be used in interrupt context.  Guess
> that the process needs to run in the context of the process.
> Therefore, call schedule_work in irq_work. Keep the original flow
> unchanged.

You should answer these questions by posting a v5 with the explanation
in the patch description (otherwise the explanation of how the fix works
doesn't end up in the changelog).


Daniel.




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux PPP]     [Linux FS]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Linmodem]     [Device Mapper]     [Linux Kernel for ARM]

  Powered by Linux