> On Mon, Oct 12, 2015 at 11:52:30PM +0530, Kashyap Desai wrote: > > > > What should be the solution if we really want to slow down IO > > > > submission to avoid CPU lockup. We don't want only one CPU to keep > > > > busy for completion. > > > > > > > > Any suggestion ? > > > > > > > Yup, file a bug with Oracle :) > > > > Neil - > > > > Thanks for info. I understood to use latest <irqbalance>...that was > > already attempted. I tried with latest irqbalance and I see expected > > behavior as long as I provide <exact> or <subset> + <--poliicyscript>. > > We are planning for the same, but wanted to understand what is latest > > <irqbalancer> default settings. Is there any reason we are seeing > > default settings changed from subset to ignore ? > > > > Latest defaults are that hinting is ignored by default, but hinting can also be > set via a policyscript on an irq by irq basis. > > The reasons for changing the default behavior are documented in commit > d9138c78c3e8cb286864509fc444ebb4484c3d70. Irq affinity hinting is > effectively a holdover from back in the days when irqbalance couldn't > understand a devices locality and irq count easily. Now that it can, there is > really no need for an irq affinity hint, unless your driver doesn't properly > participate in sysfs device ennumeration. Neil - I went through those details, but could not understand how <ignore> policy is useful. I may be missing something here. :-( With <ignore> policy, mpt3sas driver on 32 logical CPU system has below affinity mask. As you said, driver hint is ignored. That is understood as <ignore> is hinting for the same, but why affinity mask is just localized to local node (Node 0 in this case) ? What is confusing me is - "cpu affinity mask" is just localize to Numa Node-0 as PCI device enumeration detected pci device is local to numa_node 0. msix index = 0, irq number = 120, cpu affinity mask = 00400040 hint = 00000001 < - CPU mask on node-0 is 00FF00FF msix index = 1, irq number = 121, cpu affinity mask = 00800080 hint = 00000002 msix index = 2, irq number = 122, cpu affinity mask = 00400040 hint = 00000004 msix index = 3, irq number = 123, cpu affinity mask = 00100010 hint = 00000008 msix index = 4, irq number = 124, cpu affinity mask = 00800080 hint = 00000010 msix index = 5, irq number = 125, cpu affinity mask = 00020002 hint = 00000020 msix index = 6, irq number = 126, cpu affinity mask = 00400040 hint = 00000040 msix index = 7, irq number = 127, cpu affinity mask = 00800080 hint = 00000080 msix index = 8, irq number = 128, cpu affinity mask = 00400040 hint = 00000100 msix index = 9, irq number = 129, cpu affinity mask = 00100010 hint = 00000200 msix index = 10, irq number = 130, cpu affinity mask = 00400040 hint = 00000400 msix index = 11, irq number = 131, cpu affinity mask = 00020002 hint = 00000800 msix index = 12, irq number = 132, cpu affinity mask = 00400040 hint = 00001000 msix index = 13, irq number = 133, cpu affinity mask = 00400040 hint = 00002000 msix index = 14, irq number = 134, cpu affinity mask = 00400040 hint = 00004000 msix index = 15, irq number = 135, cpu affinity mask = 00800080 hint = 00008000 msix index = 16, irq number = 136, cpu affinity mask = 00100010 hint = 00010000 msix index = 17, irq number = 137, cpu affinity mask = 00020002 hint = 00020000 msix index = 18, irq number = 138, cpu affinity mask = 00400040 hint = 00040000 msix index = 19, irq number = 139, cpu affinity mask = 00100010 hint = 00080000 msix index = 20, irq number = 140, cpu affinity mask = 00400040 hint = 00100000 msix index = 21, irq number = 141, cpu affinity mask = 00800080 hint = 00200000 msix index = 22, irq number = 142, cpu affinity mask = 00100010 hint = 00400000 msix index = 23, irq number = 143, cpu affinity mask = 00020002 hint = 00800000 msix index = 24, irq number = 144, cpu affinity mask = 00400040 hint = 01000000 msix index = 25, irq number = 145, cpu affinity mask = 00800080 hint = 02000000 msix index = 26, irq number = 146, cpu affinity mask = 00400040 hint = 04000000 msix index = 27, irq number = 147, cpu affinity mask = 00100010 hint = 08000000 msix index = 28, irq number = 148, cpu affinity mask = 00800080 hint = 10000000 msix index = 29, irq number = 149, cpu affinity mask = 00020002 hint = 20000000 msix index = 30, irq number = 150, cpu affinity mask = 00800080 hint = 40000000 msix index = 31, irq number = 151, cpu affinity mask = 00800080 hint = 80000000 When you say "Driver does not participate in sysfs enumeration" - Does it mean "numa_node" exposure in sysfs or anything more than that ? Sorry for basics and helping me to understand things. ` Kashyap > > > > > > > What you're seeing looks like at least in part a bug with your (very > > old) > > > version of irqbalance. I seem to recall fixing more than a few bugs > > dealing > > > with affinity masks from the hint files and banned_cpu options. I > > strongly > > > suggest that you test with an upstream version of irqbalance and > > > contact oracle to update their version to something more recent. > > > > I see CPU lock up issue does not go if <rq_affinity> is set to 1 in > > storage stack and if <irqbalance> policy set to <ignore>. With <ignore> > > policy, I see only limited logic cpu of local NUMA node is busy doing > > completion. We are still seeing may IO pumping from remote NUMA > node. > > This will cause CPU lockup as <rq_affinity> does not migrate softirq > > to _exact_ submitter. Not sure what majority of h/w require from > > <irqbalanace> ? Is it <ignore> kind of policy good choice or <subset> ? > > > > I'm sorry, you'll have to try that again, I'm afraid I can't really parse what > you just wrote there. I _think_ what you're saying is that you're observing > irqbalance allowing cpu0 (or a small subset of cpus) handling interrupts > from your storage devices. As I said in my last note, I recal there being a > bug about that that was fixed in a later version. I also note however, that > you mention above that you are using a policy script, which Im guessing > may have some culpability in terms of you having irqs with multi-bit affinity > masks, which as I mentioned will not give you expected behaivor. If you > post your policy script, I may be able to point out where you are going > wrong. > > Neil > > > ` Kashyap > > > > > > > > Regards > > > Neil > > > > > > > ` Kashyap > > > > > > > > _______________________________________________ > > > > irqbalance mailing list > > > > irqbalance at lists.infradead.org > > > > http://lists.infradead.org/mailman/listinfo/irqbalance > > > >