* Arne Jansen <lists@xxxxxxxxxxxxxx> wrote: > > hm, it's hard to interpret that without the spin_lock()/unlock() > > logic keeping the dumps apart. > > The locking was in place from the beginning. [...] Ok, i was surprised it looked relatively ordered :-) > [...] As the output is still scrambled, there are other sources for > BUG/WARN outside the watchdog that trigger in parallel. Maybe we > should protect the whole BUG/WARN mechanism with a lock and send it > to early_printk from the beginning, so we don't have to wait for > the watchdog to kill printk off and the first BUG can come through. > Or just let WARN/BUG kill off printk instead of the watchdog > (though I have to get rid of that syslog-WARN on startup). I had yet another look at your lockup.txt and i think the main cause is the WARN_ON() caused by the not-held pi_lock. The lockup there causes other CPUs to wedge in printk, which triggers spinlock-lockup messages there. So i think the primary trigger is the pi_lock WARN_ON() (as your bisection has confirmed that too), everything else comes from this. Unfortunately i don't think we can really 'fix' the problem by removing the assert. By all means the assert is correct: pi_lock should be held there. If we are not holding it then we likely won't crash in an easily visible way - it's a lot easier to trigger asserts than to trigger obscure side-effects of locking bugs. It is also a mystery why only printk() triggers this bug. The wakeup done there is not particularly special, so by all means we should have seen similar lockups elsewhere as well - not just with printk()s. Yet we are not seeing them. So some essential piece of the puzzle is still missing. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html