On Fri, Nov 13, 2015 at 01:39:20PM -0500, Mohsin Zaidi wrote: > Thanks for your reply, Neil. > > Yes, when I manually set the irq affinity to avoid #18, it works. > > I just downloaded and applied the latest irqbalance code, but it's > showing the same behavior. > What hint policy are you using? Neil > Regards, > Mohsin > > > On Fri, Nov 13, 2015 at 8:46 AM, Neil Horman <nhorman at tuxdriver.com> wrote: > > On Thu, Nov 12, 2015 at 03:59:46PM -0500, Mohsin Zaidi wrote: > >> Hello, > >> > >> We?ve run into an irqbalance CPU banning issue that seems to be > >> present in version 1.0.4 as well as in newer versions 1.0.7 and 1.0.9. > >> > >> On an Oracle X5-2 with 72 cores, irqbalance keeps concentrating IRQs > >> from one interface (eth03) (the active slave in a bonded pair running > >> network traffic) on CPU 18/37 (more on #18), even though all CPUs but > >> 1/37 have been banned from IRQ processing. We?re seeing this on > >> multiple X5-2s. The interrupts are never directed to CPU 1. This does > >> not seem to be a problem with other 32 core servers we have. > >> > >> I?ve attached the top CPU list, /proc/interrupts for eth03, irqbalance > >> debug output, smp_affinity for eth03 IRQs (548-611), and the hardware > >> topology. > >> > >> Any help would be appreciated. Please let me know if I can provide any > >> additional information. > >> > >> Regards, > >> Mohsin > > > > A few initial questions > > > > Are you able to set irq affinity manually on these systems? And are you able to > > see those affinities take effect? I ask because the smp_affinity output you > > sent me makes it look like writes to that file for a given interrupt aren't > > getting picked up, and so the hardware is actually deciding where to steer > > interrupts. > > > > Have you tried using an upstream version of irqbalance? I ask because commit > > f1bf15ed7ea63a04c76da033b78f8ffc806d4517, which came out after 1.0.9 fixes a > > problem in which the --banirq option stopped working on a irq db reparsing. > > > > Neil > > >