Some more observations. The IRQs readily move to any CPU pair on NUMA node 1 (that "owns" eth03). I tried setting the unbanned list to 0/36/18/54 and 0/36/19/55, and all of eth03's IRQs moved accordingly (I see only 18/54 or 19/55 respectively in the smp_affinity_list for these IRQs). As soon as I move them back to 0/36/1/37, they go back to 37 and either 18 or 19 (based on what I'd chosen previously). The behavior is the same if I leave only 18/54 or 19/55 in the unbanned list. In fact, the IRQs for eth01/eth02/eth03/eth04 all move to these CPUs correctly. Regards, Mohsin On Thu, Nov 19, 2015 at 3:28 PM, Mohsin Zaidi <mohsinrzaidi at gmail.com> wrote: > Logs attached. > Regards, > Mohsin > > > On Thu, Nov 19, 2015 at 1:32 PM, Mohsin Zaidi <mohsinrzaidi at gmail.com> wrote: >> Thanks, Neil. I'll have the results for you shortly. >> >> I wanted to point out that each of the 4 interfaces on the server have >> 64 queues, so there are a total of 256 queues. Also, the banning is >> attempting to direct interrupts to just two processors (#1 and #37) on >> the same NUMA node, which is also not the same as the NUMA node that >> "owns" the interface I am looking at (eth03). >> >> Does any of this matter? >> Regards, >> Mohsin >> >> >> On Thu, Nov 19, 2015 at 9:58 AM, Neil Horman <nhorman at tuxdriver.com> wrote: >>> On Wed, Nov 18, 2015 at 10:42:41AM -0500, Mohsin Zaidi wrote: >>>> I'm using the irqbalance daemon with the following config file. The >>>> only thing I've changed is the banned CPUs list, and I've banned all >>>> but CPUs #1 and #37. Interrupts *never* go to #1, and go to #18 and >>>> #37, even though #18 has also been banned. >>>> >>>> # irqbalance is a daemon process that distributes interrupts across >>>> # CPUS on SMP systems. The default is to rebalance once every 10 >>>> # seconds. This is the environment file that is specified to systemd via the >>>> # EnvironmentFile key in the service unit file (or via whatever method the init >>>> # system you're using has. >>>> # >>>> # ONESHOT=yes >>>> # after starting, wait for a minute, then look at the interrupt >>>> # load and balance it once; after balancing exit and do not change >>>> # it again. >>>> #IRQBALANCE_ONESHOT= >>>> >>>> # >>>> # IRQBALANCE_BANNED_CPUS >>>> # 64 bit bitmask which allows you to indicate which cpu's should >>>> # be skipped when reblancing irqs. Cpu numbers which have their >>>> # corresponding bits set to one in this mask will not have any >>>> # irq's assigned to them on rebalance >>>> # >>>> #IRQBALANCE_BANNED_CPUS= >>>> IRQBALANCE_BANNED_CPUS=000000ff,ffffffdf,fffffffd >>>> >>>> # >>>> # IRQBALANCE_ARGS >>>> # append any args here to the irqbalance daemon as documented in the man page >>>> # >>>> #IRQBALANCE_ARGS= >>>> Regards, >>>> Mohsin >>>> >>>> >>>> On Wed, Nov 18, 2015 at 10:28 AM, Neil Horman <nhorman at tuxdriver.com> wrote: >>>> > On Wed, Nov 18, 2015 at 10:04:56AM -0500, Mohsin Zaidi wrote: >>>> >> Sorry about that, Neil. >>>> >> >>>> >> I haven't specified any hint policy in IRQBALANCE_ARGS (for the daemon). >>>> >> Regards, >>>> >> Mohsin >>>> >> >>>> > Ok, well, I'm at a bit of a loss. irqbalance, based on your output from the >>>> > debug log, is working properly, presuming you actually listed cpus 18 and 37 as >>>> > your only unbanned one, which you indicate is the opposite of what you've >>>> > configured. >>>> > >>>> > Can you please send me the command line you use to start irqbalance? >>>> > >>>> > Neil >>>> > >>>> >> >>>> >> On Wed, Nov 18, 2015 at 6:36 AM, Neil Horman <nhorman at tuxdriver.com> wrote: >>>> >> > On Fri, Nov 13, 2015 at 04:39:08PM -0500, Neil Horman wrote: >>>> >> >> On Fri, Nov 13, 2015 at 01:39:20PM -0500, Mohsin Zaidi wrote: >>>> >> >> > Thanks for your reply, Neil. >>>> >> >> > >>>> >> >> > Yes, when I manually set the irq affinity to avoid #18, it works. >>>> >> >> > >>>> >> >> > I just downloaded and applied the latest irqbalance code, but it's >>>> >> >> > showing the same behavior. >>>> >> >> > >>>> >> >> What hint policy are you using? >>>> >> >> >>>> >> >> Neil >>>> >> >> >>>> >> > Ping, any response regarding hint policy? >>>> >> > >>>> >> > Neil >>>> >> > >>>> >> >>>> >>> >>> I'm at something of a loss here. I can see no reason why this would fail on >>> only one system. In an effort to get additional data, please apply this patch, >>> run irqbalance in debug mode and post the output please. >>> >>> Thanks! >>> Neil >>> >>> >>> diff --git a/activate.c b/activate.c >>> index c8453d5..d92e770 100644 >>> --- a/activate.c >>> +++ b/activate.c >>> @@ -113,6 +113,7 @@ static void activate_mapping(struct irq_info *info, void *data __attribute__((un >>> return; >>> >>> cpumask_scnprintf(buf, PATH_MAX, applied_mask); >>> + printf("Applying mask for irq %d: 5s\n", info->irq, buf); >>> fprintf(file, "%s", buf); >>> fclose(file); >>> info->moved = 0; /*migration is done*/