Re: Longstanding bug in our IRQ code (irqbalance HPMCs parisc SMP machines)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le 24 avr. 10 à 16:36, Grant Grundler a écrit :

On Sat, Apr 24, 2010 at 03:29:01PM +0200, Thibaut VARÈNE wrote:


Anyway, turned out disabling irqbalance "fixed" the crash (and by crash I mean HPMC). IIRC, the general idea is that when irqbalance reroutes IRQ under heavy interrupt load, a race occurs by which one interrupt request
might end up delivered to the wrong CPU, HPMC'ing the machine.

I'm not seeing how an IRQ message getting delivered to the "wrong" CPU
would cause an HPMC. Sounds more like MSI or other mask is getting built
wrong and sending the IRQ transaction to an invalid physical address.

I'm not sure, it's been a very long time since I last tracked down this bug. Maybe I'm remembering it wrong.
FWIW, no MSI on this machine (L1000) and PCI card (sym53c896).

Now, I'm quite convinced that irqbalance could be one of the (major?)
cause of instability of the parisc autobuilders. AFAIU, they've decided
to disable it on their setup, maybe the situation will improve there.
Still, irqbalance is only the messenger, and I'm wondering whether that apparent bug in our IRQ code could also be responsible for other issues
we're still having.

Sounds like it. Though the HPMCs are clearly different than the PTE issues
that jda/carlos are seeing.

True, but I remember Debian staff complaining about random unexplained hangs, I wouldn't be too surprised if this came into play...

It's been a very long time since I last touched that code, and tbh I
never fully mastered it anyway, but I thought it'd be a good thing to
have a trace that this bug is still there, and maybe it will ring a bell
to others...

No matter what crap irqbalanced is doing, the box shouldn't crash.
I can take a look at the code path and see if something looks broken.

thanks,


You're welcome ;)

--
Thibaut VARÈNE
http://www.parisc-linux.org/~varenet/

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux