irqbalancer subset policy and CPU lock up on storage controller.

nhorman@xxxxxxxxxx (Neil Horman) · Mon, 12 Oct 2015 14:56:01 -0400

On Mon, Oct 12, 2015 at 11:52:30PM +0530, Kashyap Desai wrote:
> > > What should be the solution if we really want to slow down IO
> > > submission to avoid CPU lockup. We don't want only one CPU to keep
> > > busy for completion.
> > >
> > > Any suggestion ?
> > >
> > Yup, file a bug with Oracle :)
> 
> Neil -
> 
> Thanks for info. I understood to use latest <irqbalance>...that was
> already attempted. I tried with latest irqbalance and I see expected
> behavior as long as I provide <exact> or <subset> + <--poliicyscript>.
> We are planning for the same, but wanted to understand what is latest
> <irqbalancer> default settings. Is there any reason we are seeing default
> settings changed from  subset to ignore ?
> 

Latest defaults are that hinting is ignored by default, but hinting can also be
set via a policyscript on an irq by irq basis.

The reasons for changing the default behavior are documented in commit
d9138c78c3e8cb286864509fc444ebb4484c3d70.  Irq affinity hinting is effectively a
holdover from back in the days when irqbalance couldn't understand a devices
locality and irq count easily.  Now that it can, there is really no need for an
irq affinity hint, unless your driver doesn't properly participate in sysfs
device ennumeration.

> >
> > What you're seeing looks like at least in part a bug with your (very
> old)
> > version of irqbalance.  I seem to recall fixing more than a few bugs
> dealing
> > with affinity masks from the hint files and banned_cpu options.  I
> strongly
> > suggest that you test with an upstream version of irqbalance and contact
> > oracle to update their version to something more recent.
> 
> I see CPU lock up issue does not go if <rq_affinity> is set to 1 in
> storage stack and if <irqbalance> policy set to <ignore>.   With <ignore>
> policy, I see  only limited logic cpu of local NUMA node is busy doing
> completion. We are still seeing may IO pumping from remote NUMA node. This
> will cause CPU lockup as <rq_affinity> does not migrate softirq to _exact_
> submitter.  Not sure what majority of h/w require from <irqbalanace> ? Is
> it <ignore> kind of policy good choice or <subset> ?
> 

I'm sorry, you'll have to try that again, I'm afraid I can't really parse what
you just wrote there.  I _think_ what you're saying is that you're observing
irqbalance allowing cpu0 (or a small subset of cpus) handling interrupts from
your storage devices.  As I said in my last note, I recal there being a bug
about that that was fixed in a later version.  I also note however, that you
mention above that you are using a policy script, which Im guessing may have
some culpability in terms of you having irqs with multi-bit affinity masks,
which as I mentioned will not give you expected behaivor.  If you post your
policy script, I may be able to point out where you are going wrong.

Neil

> ` Kashyap
> 
> >
> > Regards
> > Neil
> >
> > > ` Kashyap
> > >
> > > _______________________________________________
> > > irqbalance mailing list
> > > irqbalance at lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/irqbalance
> > >