Re: Commit 78eef01b0fae087c5fadbd85dd4fe2918c3a015f (on_each_cpu(): disable local interrupts) Breaks SGI IP32

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andrew Morton wrote:

on_each_cpu() calls smp_call_function().  It is not correct to call
smp_call_function() with local interrupts disabled, because it can lead to
deadlocks.

Therefore on_each_cpu() also must not be called with local interrupts
disabled.  Therefore on_each_cpu() may use
local_irq_disable()/local_irq_enable().

 A while ago I've
fixes all such calls but I may have missed some instances.

Andrew, what was the reason for 78eef01b0fae087c5fadbd85dd4fe2918c3a015f ?


That change made the various calling environments consistent, as described
in the changelog.

If it's really, really not deadlocky to call smp_call_function() with
interrupts disabled at that time in the MIPS kernel bringup then I'd
suggest you should open-code an smp_call_function() and put a big comment
over it explaining why it's done this way, and why it isn't deadlocky.

<tries to remember what the deadlock is>

If CPU A is running smp_call_function() it's waiting for CPU B to run the
handler.

But if CPU B is presently _also_ running smp_call_function(), it's waiting
for CPU A to run the handler.

If either of those CPUs is waiting for the other with local interrupts
disabled, that CPU will never respond to the other CPU's IPI and they'll
deadlock.

The catch is, the system being affected here is strictly a UP machine. It's impossible to make an O2 go SMP. It seems that the disable call in the UP version of on_each_cpu() (which I assume is the #define macro) is what's causing this issue, since the machine comes to a halt in the dark void between function calls (i.e., that memset() I alluded to earlier)

So I'm wondering, is there a way to see if the IRQ handlers have been installed already prior to disabling them, or is this more of a machine-specific oddity wherein the IRQ handlers need to be setup earlier (I don't even know if this is even possible/relevant to O2 systems)?

It also seems this was affecting AMD Alchemy-based systems too. Other SGI machines are known to work fine, except Indy and Indigo2, as I haven't tested those yet.


--Kumba

--
Gentoo/MIPS Team Lead
Gentoo Foundation Board of Trustees

"Such is oft the course of deeds that move the wheels of the world: small hands do them because they must, while the eyes of the great are elsewhere." --Elrond


[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux