On Thu, 27 Apr 2017 17:28:07 +0200 Petr Mladek <pmladek@xxxxxxxx> wrote: > > When I get a chance, I'll see if I can insert a trigger to crash the > > kernel from NMI on another box and see if this patch helps. > > I actually tested it here using this hack: > > diff --cc lib/nmi_backtrace.c > index d531f85c0c9b,0bc0a3535a8a..000000000000 > --- a/lib/nmi_backtrace.c > +++ b/lib/nmi_backtrace.c > @@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs * > int cpu = smp_processor_id(); > > if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { > + if (in_nmi()) > + panic("Simulating panic in NMI\n"); > + arch_spin_lock(&lock); I was going to create a ftrace trigger, to crash on demand, but this may do as well. > if (regs && cpu_in_idle(instruction_pointer(regs))) { > pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n", > cpu, instruction_pointer(regs)); > > and triggered by: > > echo l > /proc/sysrq-trigger > > The patch really helped to see much more (all) messages from the ftrace > buffers in NMI mode. > > But the test is a bit artifical. The patch might not help when there > is a big printk() activity on the system when the panic() is > triggered. We might wrongly use the small per-CPU buffer when > the logbuf_lock is tested and taken on another CPU at the same time. > It means that it will not always help. > > I personally think that the patch might be good enough. I am not sure > if a perfect (more comlpex) solution is worth it. I wasn't asking for perfect, as the previous solutions never were either. I just want an optimistic dump if possible. I'll try to get some time today to test this, and let you know. But it wont be on the machine that I originally had the issue with. Thanks, -- Steve