On Thu, 17 May 2018 16:39:03 +0200 Petr Mladek <pmladek@xxxxxxxx> wrote: > The commit 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when > logbuf_lock is available") tried to detect when logbuf_lock was taken > on another CPU. Then it looked safe to wait for the lock even in NMI. > > It would be safe if other locks were not involved. Ironically the same > commit introduced an ABBA deadlock scenario. It added a spin lock into > nmi_cpu_backtrace() to serialize logs from different CPUs. The effect > is that also the NMI handlers are serialized. As a result, logbuf_lock > might be blocked by NMI on another CPU: > > CPU0 CPU1 CPU2 > > printk() > vprintk_emit() > spin_lock(&logbuf_lock) > > trigger_all_cpu_backtrace() > raise() > > nmi_enter() > printk_nmi_enter() > if (this_cpu_read(printk_context) > & PRINTK_SAFE_CONTEXT_MASK) > // false > else > // looks safe to use printk_deferred() > this_cpu_or(printk_context, > PRINTK_NMI_DEFERRED_CONTEXT_MASK); > > nmi_cpu_backtrace() > arch_spin_lock(&lock); What branch is this based on, because I can't find the "arch_spin_lock()" you are talking about here. -- Steve > show_regs() > > nmi_enter() > nmi_cpu_backtrace() > arch_spin_lock(&lock); > > printk() > vprintk_func() > vprintk_deferred() > vprintk_emit() > spin_lock(&logbuf_lock) > > DEADLOCK: between &logbuf_lock from vprintk_emit() and > &lock from nmi_cpu_backtrace(). > > CPU0 CPU1 > lock(logbuf_lock) lock(lock) > lock(lock) lock(logbuf_lock) > > I have found this problem when stress testing trigger_all_cpu_backtrace() > and the system frozen. > > Note that lockdep is not able to detect these dependencies because > there is no support for NMI context. Let's stay on the safe side > and always use printk_safe buffers when logbuf_lock is taken > when entering NMI. > > Fixes: 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when logbuf_lock is available") > Cc: 4.13+ <stable@xxxxxxxxxxxxxxx> # v4.13+ > Signed-off-by: Petr Mladek <pmladek@xxxxxxxx> > --- > kernel/printk/printk_safe.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c > index 449d67edfa4b..a2ebd749c053 100644 > --- a/kernel/printk/printk_safe.c > +++ b/kernel/printk/printk_safe.c > @@ -310,15 +310,12 @@ void printk_nmi_enter(void) > { > /* > * The size of the extra per-CPU buffer is limited. Use it only when > - * the main one is locked. If this CPU is not in the safe context, > - * the lock must be taken on another CPU and we could wait for it. > + * the main one is locked. > */ > - if ((this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) && > - raw_spin_is_locked(&logbuf_lock)) { > + if (raw_spin_is_locked(&logbuf_lock)) > this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); > - } else { > + else > this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); > - } > } > > void printk_nmi_exit(void)