Re: Commit 78eef01b0fae087c5fadbd85dd4fe2918c3a015f (on_each_cpu(): disable local interrupts) Breaks SGI IP32

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 28 May 2006 02:06:03 +0100
Ralf Baechle <ralf@xxxxxxxxxxxxxx> wrote:

> On Sat, May 27, 2006 at 05:13:21PM -0400, Kumba wrote:
> 
> > Finally managed to track down the git commit causing SGI IP32 (O2) systems 
> > to lock up really early in the boot cycle, but I'm at a loss to understand 
> > why.
> > 
> > Effect:
> > It appears the system silently hangs somewhere in the void between function 
> > calls when trying to invoke the memset() call in __alloc_bootmem_core() in 
> > mm/bootmem.c.  This puts the machine hardware in a state such that a simple 
> > soft reset doesn't clear it -- the machine has to be cold booted to get it 
> > to boot a working kernel again.
> > 
> > Determined Cause:
> > It seems this commit:
> > 78eef01b0fae087c5fadbd85dd4fe2918c3a015f
> > 	[PATCH] on_each_cpu(): disable local interrupts
> > 
> > Is the cause.  I've verified this by reversing this one change on a 
> > 2.6.17-rc4 tree, and it'll boot to a mini-userland (initramfs-based) and 
> > appears to function normally.
> > 
> > 
> > But this is as far as I can trace this.  I'm not sure what this change is 
> > doing internally that's triggering this lockup on O2 systems.  It doesn't 
> > appear to affect Octane (IP30) systems or Origin (IP27).  I haven't 
> > test-ran it on IP22/IP28 hardware yet, so only IP32 is known to be 
> > affected.  Unsure about non-SGI MIPS hardware.
> 
> on_each_cpu is re-enabling interrupt.  This may crash the system if it
> happens before interrupt handlers have been installed.

on_each_cpu() calls smp_call_function().  It is not correct to call
smp_call_function() with local interrupts disabled, because it can lead to
deadlocks.

Therefore on_each_cpu() also must not be called with local interrupts
disabled.  Therefore on_each_cpu() may use
local_irq_disable()/local_irq_enable().

>  A while ago I've
> fixes all such calls but I may have missed some instances.
> 
> Andrew, what was the reason for 78eef01b0fae087c5fadbd85dd4fe2918c3a015f ?
> 

That change made the various calling environments consistent, as described
in the changelog.

If it's really, really not deadlocky to call smp_call_function() with
interrupts disabled at that time in the MIPS kernel bringup then I'd
suggest you should open-code an smp_call_function() and put a big comment
over it explaining why it's done this way, and why it isn't deadlocky.

<tries to remember what the deadlock is>

If CPU A is running smp_call_function() it's waiting for CPU B to run the
handler.

But if CPU B is presently _also_ running smp_call_function(), it's waiting
for CPU A to run the handler.

If either of those CPUs is waiting for the other with local interrupts
disabled, that CPU will never respond to the other CPU's IPI and they'll
deadlock.



[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux