Re: [tip:core/debug] debug lockups: Improve lockup detection

Ingo Molnar <mingo@xxxxxxx> · Sun, 2 Aug 2009 21:26:57 +0200

* Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Sun, 2 Aug 2009 13:09:34 GMT tip-bot for Ingo Molnar <mingo@xxxxxxx> wrote:
> 
> > Commit-ID:  c1dc0b9c0c8979ce4d411caadff5c0d79dee58bc
> > Gitweb:     http://git.kernel.org/tip/c1dc0b9c0c8979ce4d411caadff5c0d79dee58bc
> > Author:     Ingo Molnar <mingo@xxxxxxx>
> > AuthorDate: Sun, 2 Aug 2009 11:28:21 +0200
> > Committer:  Ingo Molnar <mingo@xxxxxxx>
> > CommitDate: Sun, 2 Aug 2009 13:27:17 +0200
> > 
> > --- a/drivers/char/sysrq.c
> > +++ b/drivers/char/sysrq.c
> > @@ -24,6 +24,7 @@
> >  #include <linux/sysrq.h>
> >  #include <linux/kbd_kern.h>
> >  #include <linux/proc_fs.h>
> > +#include <linux/nmi.h>
> >  #include <linux/quotaops.h>
> >  #include <linux/perf_counter.h>
> >  #include <linux/kernel.h>
> > @@ -222,12 +223,7 @@ static DECLARE_WORK(sysrq_showallcpus, sysrq_showregs_othercpus);
> >  
> >  static void sysrq_handle_showallcpus(int key, struct tty_struct *tty)
> >  {
> > -	struct pt_regs *regs = get_irq_regs();
> > -	if (regs) {
> > -		printk(KERN_INFO "CPU%d:\n", smp_processor_id());
> > -		show_regs(regs);
> > -	}
> > -	schedule_work(&sysrq_showallcpus);
> > +	trigger_all_cpu_backtrace();
> >  }
> 
> I think this just broke all non-x86 non-sparc SMP architectures.

Yeah - it 'broke' them in the sense of them not having a working 
trigger_all_cpu_backtrace() implementation to begin with. (which 
breaks/degrades spinlock-debug to begin with so it's an existing 
problem)

One solution would be to do a generic trigger_all_cpu_backtrace() 
implementation that does the above schedule_work() approach.

I never understood why we proliferated all these different 
backtrace-triggering mechanisms instead of doing one good approach 
that everything uses.

> >  static struct sysrq_key_op sysrq_showallcpus_op = {
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 7717b95..9c5fa9f 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -35,6 +35,7 @@
> >  #include <linux/rcupdate.h>
> >  #include <linux/interrupt.h>
> >  #include <linux/sched.h>
> > +#include <linux/nmi.h>
> >  #include <asm/atomic.h>
> >  #include <linux/bitops.h>
> >  #include <linux/module.h>
> > @@ -469,6 +470,8 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
> >  	}
> >  	printk(" (detected by %d, t=%ld jiffies)\n",
> >  	       smp_processor_id(), (long)(jiffies - rsp->gp_start));
> > +	trigger_all_cpu_backtrace();
> 
> Be aware that trigger_all_cpu_backtrace() is a PITA when you have 
> a lot of CPUs.
> 
> If a callsite is careful to ensure that the most important 
> information is emitted last then that might improve things.
> 
> otoh, log buffer overflow will truncate, I think.  So that info 
> needs to be emitted first too ;)
> 
> It's a PITA.

Yeah, it is - i'd expect larger systems to have larger log buffers. 
Lack of info was obviously a showstopper with the highest priority.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html