2015-05-18 3:47 GMT+08:00 Neil Horman <nhorman at tuxdriver.com>: > On Sun, May 17, 2015 at 06:10:37PM +0200, Cedric Gava wrote: >> Hello >> >> The message "No irq handler for vector (irq -1)" appears with no apparent period/correlation. After googling, irqbalance seems to be suspect >> > This has nothing to do with irqbalance, but rather with the mechanism by which > the kernel migrates irqs and sets their affinity. There are some irq > controllers which fail to drain interrupts from a given cpu when being migrated, > and this leads to an inconsistency. > > Do you have a 55XX or X58 chipset on your system? I expect you do. You would > need to run a 3.14 kernel or later to get the full set of pci quirks that will > fix the problem (though, by fixing the problem, the the solution is to disable > iommu operation). > Yes. Neil is right. The error messages was printed by kernel irq migration code path, whereas irqbalance is from user space. Depending on the root cause of the bug, sometimes the simplest way to workaround the bug is not to balance irq on your system. For of all, you need identify which IRQ migration could cause the error messages. Second of all, you can start irqbalance with irq ban options that disable irqbalance for those specific irqs. Upgrade kernel to latest version can just help you address some well-know problems. However, there are some problems were new to Linux kernel. In last year, when I upgrade my kernel from 2.6.23 to 3.2.x, I ran into this kind of issues on a SandyBridge based x86 server. After debugging, we found the AHCI SSD controller in Intel PCH chipset is the source of faulty irq. Latest kernel doesn't have the fix. We applied above workaround first. Later, I worked out a private patch in kernel IRQ migration patch. Anyway, if latest kernel can't address your problem, you may still can narrow down the faulty IRQs.