irqbalancer subset policy and CPU lock up on storage controller.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Mon, Oct 12, 2015 at 11:52:30PM +0530, Kashyap Desai wrote:
> > > > What should be the solution if we really want to slow down IO
> > > > submission to avoid CPU lockup. We don't want only one CPU to keep
> > > > busy for completion.
> > > >
> > > > Any suggestion ?
> > > >
> > > Yup, file a bug with Oracle :)
> >
> > Neil -
> >
> > Thanks for info. I understood to use latest <irqbalance>...that was
> > already attempted. I tried with latest irqbalance and I see expected
> > behavior as long as I provide <exact> or <subset> + <--poliicyscript>.
> > We are planning for the same, but wanted to understand what is latest
> > <irqbalancer> default settings. Is there any reason we are seeing
> > default settings changed from  subset to ignore ?
> >
>
> Latest defaults are that hinting is ignored by default, but hinting can
also be
> set via a policyscript on an irq by irq basis.
>
> The reasons for changing the default behavior are documented in commit
> d9138c78c3e8cb286864509fc444ebb4484c3d70.  Irq affinity hinting is
> effectively a holdover from back in the days when irqbalance couldn't
> understand a devices locality and irq count easily.  Now that it can,
there is
> really no need for an irq affinity hint, unless your driver doesn't
properly
> participate in sysfs device ennumeration.

Neil - I went through those details, but could not understand how <ignore>
policy is useful. I may be missing something here. :-(
With <ignore> policy, mpt3sas driver on 32 logical CPU system has below
affinity mask. As you said, driver hint is ignored.  That is understood as
<ignore> is hinting for the same, but why affinity mask is just localized
to local node (Node 0 in this case) ?
What is confusing me is - "cpu affinity mask" is just localize to Numa
Node-0  as PCI device enumeration detected pci device is local to
numa_node 0.

        msix index = 0, irq number =  120, cpu affinity mask = 00400040
hint = 00000001			< - CPU mask on node-0 is 00FF00FF
        msix index = 1, irq number =  121, cpu affinity mask = 00800080
hint = 00000002
        msix index = 2, irq number =  122, cpu affinity mask = 00400040
hint = 00000004
        msix index = 3, irq number =  123, cpu affinity mask = 00100010
hint = 00000008
        msix index = 4, irq number =  124, cpu affinity mask = 00800080
hint = 00000010
        msix index = 5, irq number =  125, cpu affinity mask = 00020002
hint = 00000020
        msix index = 6, irq number =  126, cpu affinity mask = 00400040
hint = 00000040
        msix index = 7, irq number =  127, cpu affinity mask = 00800080
hint = 00000080
        msix index = 8, irq number =  128, cpu affinity mask = 00400040
hint = 00000100
        msix index = 9, irq number =  129, cpu affinity mask = 00100010
hint = 00000200
        msix index = 10, irq number =  130, cpu affinity mask = 00400040
hint = 00000400
        msix index = 11, irq number =  131, cpu affinity mask = 00020002
hint = 00000800
        msix index = 12, irq number =  132, cpu affinity mask = 00400040
hint = 00001000
        msix index = 13, irq number =  133, cpu affinity mask = 00400040
hint = 00002000
        msix index = 14, irq number =  134, cpu affinity mask = 00400040
hint = 00004000
        msix index = 15, irq number =  135, cpu affinity mask = 00800080
hint = 00008000
        msix index = 16, irq number =  136, cpu affinity mask = 00100010
hint = 00010000
        msix index = 17, irq number =  137, cpu affinity mask = 00020002
hint = 00020000
        msix index = 18, irq number =  138, cpu affinity mask = 00400040
hint = 00040000
        msix index = 19, irq number =  139, cpu affinity mask = 00100010
hint = 00080000
        msix index = 20, irq number =  140, cpu affinity mask = 00400040
hint = 00100000
        msix index = 21, irq number =  141, cpu affinity mask = 00800080
hint = 00200000
        msix index = 22, irq number =  142, cpu affinity mask = 00100010
hint = 00400000
        msix index = 23, irq number =  143, cpu affinity mask = 00020002
hint = 00800000
        msix index = 24, irq number =  144, cpu affinity mask = 00400040
hint = 01000000
        msix index = 25, irq number =  145, cpu affinity mask = 00800080
hint = 02000000
        msix index = 26, irq number =  146, cpu affinity mask = 00400040
hint = 04000000
        msix index = 27, irq number =  147, cpu affinity mask = 00100010
hint = 08000000
        msix index = 28, irq number =  148, cpu affinity mask = 00800080
hint = 10000000
        msix index = 29, irq number =  149, cpu affinity mask = 00020002
hint = 20000000
        msix index = 30, irq number =  150, cpu affinity mask = 00800080
hint = 40000000
        msix index = 31, irq number =  151, cpu affinity mask = 00800080
hint = 80000000


When you say "Driver does not participate in sysfs enumeration" - Does it
mean "numa_node" exposure in sysfs or anything more than that ? Sorry for
basics and helping me to understand things.

` Kashyap

>
> > >
> > > What you're seeing looks like at least in part a bug with your (very
> > old)
> > > version of irqbalance.  I seem to recall fixing more than a few bugs
> > dealing
> > > with affinity masks from the hint files and banned_cpu options.  I
> > strongly
> > > suggest that you test with an upstream version of irqbalance and
> > > contact oracle to update their version to something more recent.
> >
> > I see CPU lock up issue does not go if <rq_affinity> is set to 1 in
> > storage stack and if <irqbalance> policy set to <ignore>.   With
<ignore>
> > policy, I see  only limited logic cpu of local NUMA node is busy doing
> > completion. We are still seeing may IO pumping from remote NUMA
> node.
> > This will cause CPU lockup as <rq_affinity> does not migrate softirq
> > to _exact_ submitter.  Not sure what majority of h/w require from
> > <irqbalanace> ? Is it <ignore> kind of policy good choice or <subset>
?
> >
>
> I'm sorry, you'll have to try that again, I'm afraid I can't really
parse what
> you just wrote there.  I _think_ what you're saying is that you're
observing
> irqbalance allowing cpu0 (or a small subset of cpus) handling interrupts
> from your storage devices.  As I said in my last note, I recal there
being a
> bug about that that was fixed in a later version.  I also note however,
that
> you mention above that you are using a policy script, which Im guessing
> may have some culpability in terms of you having irqs with multi-bit
affinity
> masks, which as I mentioned will not give you expected behaivor.  If you
> post your policy script, I may be able to point out where you are going
> wrong.
>
> Neil
>
> > ` Kashyap
> >
> > >
> > > Regards
> > > Neil
> > >
> > > > ` Kashyap
> > > >
> > > > _______________________________________________
> > > > irqbalance mailing list
> > > > irqbalance at lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/irqbalance
> > > >



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux