* Arne Jansen <lists@xxxxxxxxxxxxxx> wrote: > On 05.06.2011 11:55, Ingo Molnar wrote: > > > >* Arne Jansen<lists@xxxxxxxxxxxxxx> wrote: > > > >>>( Arne, please also double check on a working bootup that the NMI > >>> watchdog is actually ticking, by checking the NMI counts in > >>> /proc/interrupts go up slowly but surely on all CPUs. ) > >> > >>It does, but _very_ slowly. Some CPUs do not count up for tens of > >>minutes if the machine is idle. If I generate some load like 'make > >>tags', the counters go up quite quickly. > >>After 4 minutes and one 'make cscope' it looks like this: > >>NMI: 8 13 43 5 2 > >>3 22 1 Non-maskable interrupts > >> > >>But I never see a single tick on console or in dmesg, even when I > >>replace the early_printk with a printk. > > > >hm, that might be because the NMI watchdog uses halted cycles to > >tick. > > > >That's not a problem (the kernel cannot lock up while there are no > >cycles ticking) but nevertheless could you work this around please > >by starting 8 infinite shell loops: > > > > for ((i=0; i<8; i++)); do while : ; do : ; done& done > > > >? > > > >This will saturate all cores and makes sure the NMI watchdog is > >ticking everywhere. > > > >Hopefully this wont make the bug go away :-) > > > > OK, now we get going. I get the ticks, the bug is still there, and > all CPUs still tick after the lockup. I also added an early_printk > inside the lockup-if, and it reports hard lockups. At first for only > one or 2 CPUs, and after some time all CPUs are locked up. Very good! If you add a dump_stack() do you get a stacktrace, or do the NMI watchdog ticks stop? If the ticks stop this suggests a lockup within the printk code. If you get a stack dump then we'll have good debug data. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html