* Andreas Barth (aba@xxxxxxxxxxxxxxx) [100415 22:35]: > * Andreas Barth (aba@xxxxxxxxxxxxxxx) [100415 20:43]: > > * Peter 'p2' De Schrijver (p2@xxxxxxxxxx) [100403 17:43]: > > > http://zobel.ftbfs.de/.x/lucatelli-nmi-watchdog-output.txt > > > Dump of one of those hangs. Most cores seem to be stuck in wait > > > (0xffffffff81100b80), except for core 1 which is in octeon_irq_ciu0_ack > > > (octeon_irq_ciu0_ack). > > > > On further investigation we found out that this happens when > > irqbalance is started. The version of irqbalance being run is 0.55. > > > > We removed this program from the affected machine, but of course this > > still should be fixed (and we still get a few reboots on another > > machine without irqbalance). > > Clarification: > > Running irqbalance itself doesn't crash the machine, but increases the > probability of crashes dramatically. Usually the next few (< 10) > commands crash the machine. > > The crashs however look similar to the ones we have without irqbalance > - just way less often then with irqbalance. This seems like irqbalance > exposes the crash way better than we do without. Any ideas what we could do to reduce the number of crashes we experience? Andi