Longstanding bug in our IRQ code (irqbalance HPMCs parisc SMP machines)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Pa-ckers,

Just for the records, I'd like to raise some attention to what seems like a pretty old bug in our IRQ code that is apparently still affecting us.

Long story short: while trying to figure out why the recently attached 10-disk bay was killing the Debian "lafayette" autobuilder during raid resync, I noticed that irqbalance was part of the default Debian autobuilder setup.

The nastiness of irqbalance has been discussed before, and I remembered having had issues in the past (5+ years ago) on my parisc machines with that daemon. I couldn't find a pointer to a m-l thread, I don't remember if I discussed that on IRC or elsewhere.

Anyway, turned out disabling irqbalance "fixed" the crash (and by crash I mean HPMC). IIRC, the general idea is that when irqbalance reroutes IRQ under heavy interrupt load, a race occurs by which one interrupt request might end up delivered to the wrong CPU, HPMC'ing the machine.

I have no particular opinion on whether it should be expected that something as stupid as irqbalance could crash a system, but others seem to believe it shouldn't (claiming "it works on *real* [read: x86] hardware").

Now, I'm quite convinced that irqbalance could be one of the (major?) cause of instability of the parisc autobuilders. AFAIU, they've decided to disable it on their setup, maybe the situation will improve there. Still, irqbalance is only the messenger, and I'm wondering whether that apparent bug in our IRQ code could also be responsible for other issues we're still having.

It's been a very long time since I last touched that code, and tbh I never fully mastered it anyway, but I thought it'd be a good thing to have a trace that this bug is still there, and maybe it will ring a bell to others...

HTH

T-Bone

--
Thibaut Varène
http://www.parisc-linux.org/~varenet/--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux