Re: [PATCH 2/2] Fix lockup related to stop_machine being stuck in __do_softirq.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So this one kind of fell through the cracks, partly because I don't
exactly love the patch.

What is it that keeps re-arming the softirq pending bit all the time?
You mention the ath9k driver..

Also, do we really need the jiffies-based one at all? Maybe we should
just get rid of that entirely, if it's not sufficiently reliable
anyway. It's not like we should *ever* keep doing softirq's forever,
and quite frankly, when you introduce the limit of doing the loop at
most ten times, I doubt that the "2 milliseconds" limit is even
relevant any more. It would be a strange situation where ten times
through the softirq handling loop would take more than 2ms.

So I'd rather take a patch that replaces the 2ms timeout with the
10-iteration timeout. And I think it might be a good idea to have a
debug thing that says what the softirq that keepts firing was. If it's
ath9k, I guess it's NET_TX/RX_SOFTIRQ, but maybe we could have
something that tells exact what it is that re-triggers it over and
over again..

               Linus

On Wed, May 13, 2015 at 11:29 PM, Rui Xiang <rui.xiang@xxxxxxxxxx> wrote:
> From: Ben Greear <greearb@xxxxxxxxxxxxxxx>
>
> commit 34376a50fb1fa095b9d0636fa41ed2e73125f214 upstream.
>
> The stop machine logic can lock up if all but one of the migration
> threads make it through the disable-irq step and the one remaining
> thread gets stuck in __do_softirq.  The reason __do_softirq can hang is
> that it has a bail-out based on jiffies timeout, but in the lockup case,
> jiffies itself is not incremented.
>
> To work around this, re-add the max_restart counter in __do_irq and stop
> processing irqs after 10 restarts.
>
> Thanks to Tejun Heo and Rusty Russell and others for helping me track
> this down.
>
> This was introduced in 3.9 by commit c10d73671ad3 ("softirq: reduce
> latencies").
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]