* Andreas Barth (aba@xxxxxxxxxxxxxxx) [100415 20:43]: > * Peter 'p2' De Schrijver (p2@xxxxxxxxxx) [100403 17:43]: > > http://zobel.ftbfs.de/.x/lucatelli-nmi-watchdog-output.txt > > Dump of one of those hangs. Most cores seem to be stuck in wait > > (0xffffffff81100b80), except for core 1 which is in octeon_irq_ciu0_ack > > (octeon_irq_ciu0_ack). > > On further investigation we found out that this happens when > irqbalance is started. The version of irqbalance being run is 0.55. > > We removed this program from the affected machine, but of course this > still should be fixed (and we still get a few reboots on another > machine without irqbalance). Clarification: Running irqbalance itself doesn't crash the machine, but increases the probability of crashes dramatically. Usually the next few (< 10) commands crash the machine. The crashs however look similar to the ones we have without irqbalance - just way less often then with irqbalance. This seems like irqbalance exposes the crash way better than we do without. Andi