Re: [PATCH] printk/nmi: Prevent deadlock when serializing NMI backtraces

Petr Mladek <pmladek@xxxxxxxx> · Wed, 6 Jun 2018 13:15:57 +0200

On Wed 2018-06-06 14:10:29, Sergey Senozhatsky wrote:
> On (06/05/18 14:47), Petr Mladek wrote:
> [..]
> > Grr, the ABBA deadlock is still there. NMIs are not sent to the other
> > CPUs atomically. Even if we detect that logbuf_lock is available
> > in printk_nmi_enter() on some CPUs, it might still get locked on
> > another CPU before the other CPU gets NMI.
> 
> Can we do something about "B"? :) I mean - any chance we can rework
> locking in nmi_cpu_backtrace()?

I can't think of any possibility at the moment. Well, it does not mean
that it does not exist.

The irony is that we need the extra lock in nmi_cpu_backtrace() only
because we try to store the messages into the common log buffer.
If we always use the per-CPU buffers in NMI then the lock would
not cause any problems but it also won't be necessary.

> > By other words, any check in printk_safe_enter() is racy and not
> > sufficient
> 
> I suppose you meant printk_nmi_enter().

Yup, I am sorry for confusion.

> >   => I suggest to revert the commit 719f6a7040f1bdaf96fcc70
> >      "printk: Use the main logbuf in NMI when logbuf_lock is available"
> >      for-4.18 and stable until we get a better solution.
> 
> Just random thoughts.
> 
> May be we need to revert it, but let's not "panic". I think [but don't
> insist on it] that the patch in question is *probably* "good enough". It
> addresses a bug report after all.

It was a problem reported by me. I found it when testing other changes.
The patch improved the situation definitely. The question is if it is
enough in practice.

> How often do we have arch_trigger_cpumask_backtrace() on all CPUs these
> days? I tend to think that it used to be much more popular in the past,
> because we had a loops_per_jiffy based spin_lock lockup detection which
> would trigger NMI backtracase, but this functionality has gone, see
> bc88c10d7e6900916f5e1ba3829d66a9de92b633 for details.

AFAIK, we (SUSE) started to work on the printk/NMI deadlocks because
it was too risky to use sysrq-l on systems with many CPUs. It means that
the use case is still there.

Hmm, the original problem was that any interrupted logbuf_lock owner
caused the deadlock. It is _different now_ because the interrupted
logbuf_lock owner must block another CPU via the lock in
nmi_cpu_backtrace().

I actually wonder why this patch actually helped so much in my test
(while true; do echo l >/proc/sysrq-trigger; done). My guess is
that it increased a lot the number of CPUs using PRINTK_NMI_CONTEXT
(per-CPU buffers). Therefore the deadlock was less likely.
On the other hand, the original problem was basically back
without this patch.

> I'm not saying that the race condition that you found is
> unrealistic, I'm just saying that _it seems_ that
> nmi_panic()->printk() on a single CPU is more common
> now, so having that nmi_printk()->printk_deferred() might be quite
> valuable at the end of the day.

I see the point. I need to think about it.

> May be I'm wrong!
> 
> > The only safe solution seems to be a trylock() in NMI in
> > vprintk_emit() and fallback to vprintk_safe() when the lock
> > is not taken. I mean something like:
> > 
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 247808333ba4..4a5a0bf221b3 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1845,7 +1845,13 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  	printk_delay();
> >  
> >  	/* This stops the holder of console_sem just where we want him */
> > -	logbuf_lock_irqsave(flags);
> > +	printk_safe_enter_irqsave(flags);
> > +	if (in_nmi() && !raw_spin_trylock(&logbuf_lock)) {
> > +		vprintk_nmi(fmt, args);
> > +		printk_safe_exit_irqrestore(flags);
> > +		return;
> > +	} else
> > +		raw_spin_lock(&logbuf_lock);
> >  	/*
> >  	 * The printf needs to come first; we need the syslog
> >  	 * prefix which might be passed-in as a parameter.
> 
> I need some time to think about it.

The bad thing about this solution is that different messages might
be handled different way during a single nmi_cpu_backtrace().
The result might be mixed backtraces in the log.

> > Sigh, this looks like a material for-4.19.
> 
> Agreed.
> 
> > We might need to revisit if printk_context still makes sense, ...

> What do you mean by this?

My dream was to select the right vprintk implementation according
to printk_context on a single place (vprintk_func()). If this is not
usable for NMI, we might need to think about another solution.
But this is premature. I believe the printk_context will most
likely make still sense.

I need to think more about it.

Best Regards,
Petr