irqbalance problem on Oracle X5-2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Some more observations.

The IRQs readily move to any CPU pair on NUMA node 1 (that "owns"
eth03). I tried setting the unbanned list to 0/36/18/54 and
0/36/19/55, and all of eth03's IRQs moved accordingly (I see only
18/54 or 19/55 respectively in the smp_affinity_list for these IRQs).

As soon as I move them back to 0/36/1/37, they go back to 37 and
either 18 or 19 (based on what I'd chosen previously).

The behavior is the same if I leave only 18/54 or 19/55 in the
unbanned list. In fact, the IRQs for eth01/eth02/eth03/eth04 all move
to these CPUs correctly.
Regards,
Mohsin


On Thu, Nov 19, 2015 at 3:28 PM, Mohsin Zaidi <mohsinrzaidi at gmail.com> wrote:
> Logs attached.
> Regards,
> Mohsin
>
>
> On Thu, Nov 19, 2015 at 1:32 PM, Mohsin Zaidi <mohsinrzaidi at gmail.com> wrote:
>> Thanks, Neil. I'll have the results for you shortly.
>>
>> I wanted to point out that each of the 4 interfaces on the server have
>> 64 queues, so there are a total of 256 queues. Also, the banning is
>> attempting to direct interrupts to just two processors (#1 and #37) on
>> the same NUMA node, which is also not the same as the NUMA node that
>> "owns" the interface I am looking at (eth03).
>>
>> Does any of this matter?
>> Regards,
>> Mohsin
>>
>>
>> On Thu, Nov 19, 2015 at 9:58 AM, Neil Horman <nhorman at tuxdriver.com> wrote:
>>> On Wed, Nov 18, 2015 at 10:42:41AM -0500, Mohsin Zaidi wrote:
>>>> I'm using the irqbalance daemon with the following config file. The
>>>> only thing I've changed is the banned CPUs list, and I've banned all
>>>> but CPUs #1 and #37. Interrupts *never* go to #1, and go to #18 and
>>>> #37, even though #18 has also been banned.
>>>>
>>>> # irqbalance is a daemon process that distributes interrupts across
>>>> # CPUS on SMP systems. The default is to rebalance once every 10
>>>> # seconds. This is the environment file that is specified to systemd via the
>>>> # EnvironmentFile key in the service unit file (or via whatever method the init
>>>> # system you're using has.
>>>> #
>>>> # ONESHOT=yes
>>>> # after starting, wait for a minute, then look at the interrupt
>>>> # load and balance it once; after balancing exit and do not change
>>>> # it again.
>>>> #IRQBALANCE_ONESHOT=
>>>>
>>>> #
>>>> # IRQBALANCE_BANNED_CPUS
>>>> # 64 bit bitmask which allows you to indicate which cpu's should
>>>> # be skipped when reblancing irqs. Cpu numbers which have their
>>>> # corresponding bits set to one in this mask will not have any
>>>> # irq's assigned to them on rebalance
>>>> #
>>>> #IRQBALANCE_BANNED_CPUS=
>>>> IRQBALANCE_BANNED_CPUS=000000ff,ffffffdf,fffffffd
>>>>
>>>> #
>>>> # IRQBALANCE_ARGS
>>>> # append any args here to the irqbalance daemon as documented in the man page
>>>> #
>>>> #IRQBALANCE_ARGS=
>>>> Regards,
>>>> Mohsin
>>>>
>>>>
>>>> On Wed, Nov 18, 2015 at 10:28 AM, Neil Horman <nhorman at tuxdriver.com> wrote:
>>>> > On Wed, Nov 18, 2015 at 10:04:56AM -0500, Mohsin Zaidi wrote:
>>>> >> Sorry about that, Neil.
>>>> >>
>>>> >> I haven't specified any hint policy in IRQBALANCE_ARGS (for the daemon).
>>>> >> Regards,
>>>> >> Mohsin
>>>> >>
>>>> > Ok, well, I'm at a bit of a loss.  irqbalance, based on your output from the
>>>> > debug log, is working properly, presuming you actually listed cpus 18 and 37 as
>>>> > your only unbanned one, which you indicate is the opposite of what you've
>>>> > configured.
>>>> >
>>>> > Can you please send me the command line you use to start irqbalance?
>>>> >
>>>> > Neil
>>>> >
>>>> >>
>>>> >> On Wed, Nov 18, 2015 at 6:36 AM, Neil Horman <nhorman at tuxdriver.com> wrote:
>>>> >> > On Fri, Nov 13, 2015 at 04:39:08PM -0500, Neil Horman wrote:
>>>> >> >> On Fri, Nov 13, 2015 at 01:39:20PM -0500, Mohsin Zaidi wrote:
>>>> >> >> > Thanks for your reply, Neil.
>>>> >> >> >
>>>> >> >> > Yes, when I manually set the irq affinity to avoid #18, it works.
>>>> >> >> >
>>>> >> >> > I just downloaded and applied the latest irqbalance code, but it's
>>>> >> >> > showing the same behavior.
>>>> >> >> >
>>>> >> >> What hint policy are you using?
>>>> >> >>
>>>> >> >> Neil
>>>> >> >>
>>>> >> > Ping, any response regarding hint policy?
>>>> >> >
>>>> >> > Neil
>>>> >> >
>>>> >>
>>>>
>>>
>>> I'm at something of a loss here.  I can see no reason why this would fail on
>>> only one system.  In an effort to get additional data, please apply this patch,
>>> run irqbalance in debug mode and post the output please.
>>>
>>> Thanks!
>>> Neil
>>>
>>>
>>> diff --git a/activate.c b/activate.c
>>> index c8453d5..d92e770 100644
>>> --- a/activate.c
>>> +++ b/activate.c
>>> @@ -113,6 +113,7 @@ static void activate_mapping(struct irq_info *info, void *data __attribute__((un
>>>                 return;
>>>
>>>         cpumask_scnprintf(buf, PATH_MAX, applied_mask);
>>> +       printf("Applying mask for irq %d: 5s\n", info->irq, buf);
>>>         fprintf(file, "%s", buf);
>>>         fclose(file);
>>>         info->moved = 0; /*migration is done*/



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux