So this one kind of fell through the cracks, partly because I don't exactly love the patch. What is it that keeps re-arming the softirq pending bit all the time? You mention the ath9k driver.. Also, do we really need the jiffies-based one at all? Maybe we should just get rid of that entirely, if it's not sufficiently reliable anyway. It's not like we should *ever* keep doing softirq's forever, and quite frankly, when you introduce the limit of doing the loop at most ten times, I doubt that the "2 milliseconds" limit is even relevant any more. It would be a strange situation where ten times through the softirq handling loop would take more than 2ms. So I'd rather take a patch that replaces the 2ms timeout with the 10-iteration timeout. And I think it might be a good idea to have a debug thing that says what the softirq that keepts firing was. If it's ath9k, I guess it's NET_TX/RX_SOFTIRQ, but maybe we could have something that tells exact what it is that re-triggers it over and over again.. Linus On Wed, May 13, 2015 at 11:29 PM, Rui Xiang <rui.xiang@xxxxxxxxxx> wrote: > From: Ben Greear <greearb@xxxxxxxxxxxxxxx> > > commit 34376a50fb1fa095b9d0636fa41ed2e73125f214 upstream. > > The stop machine logic can lock up if all but one of the migration > threads make it through the disable-irq step and the one remaining > thread gets stuck in __do_softirq. The reason __do_softirq can hang is > that it has a bail-out based on jiffies timeout, but in the lockup case, > jiffies itself is not incremented. > > To work around this, re-add the max_restart counter in __do_irq and stop > processing irqs after 10 restarts. > > Thanks to Tejun Heo and Rusty Russell and others for helping me track > this down. > > This was introduced in 3.9 by commit c10d73671ad3 ("softirq: reduce > latencies"). -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html