Observing Softlockup's while running heavy IOs

sreekanth.reddy@xxxxxxxxxxxx (Sreekanth Reddy) · Wed, 7 Sep 2016 11:30:04 +0530

On Tue, Sep 6, 2016 at 8:36 PM, Neil Horman <nhorman at tuxdriver.com> wrote:
> On Tue, Sep 06, 2016 at 04:52:37PM +0530, Sreekanth Reddy wrote:
>> On Fri, Sep 2, 2016 at 4:34 AM, Bart Van Assche
>> <bart.vanassche at sandisk.com> wrote:
>> > On 09/01/2016 03:31 AM, Sreekanth Reddy wrote:
>> >>
>> >> I reduced the ISR workload by one third in-order to reduce the time
>> >> that is spent per CPU in interrupt context, even then I am observing
>> >> softlockups.
>> >>
>> >> As I mentioned before only same single CPU in the set of CPUs(enabled
>> >> in affinity_hint) is busy with handling the interrupts from
>> >> corresponding IRQx. I have done below experiment in driver to limit
>> >> these softlockups/hardlockups. But I am not sure whether it is
>> >> reasonable to do this in driver,
>> >>
>> >> Experiment:
>> >> If the CPUx is continuously busy with handling the remote CPUs
>> >> (enabled in the corresponding IRQ's affinity_hint) IO works by 1/4th
>> >> of the HBA queue depth in the same ISR context then enable a flag
>> >> called 'change_smp_affinity' for this IRQ. Also created a thread with
>> >> will poll for this flag for every IRQ's (enabled by driver) for every
>> >> second. If this thread see that this flag is enabled for any IRQ then
>> >> it will write next CPU number from the CPUs enabled in the IRQ's
>> >> affinity_hint to the IRQ's smp_affinity procfs attribute using
>> >> 'call_usermodehelper()' API.
>> >>
>> >> This to make sure that interrupts are not processed by same single CPU
>> >> all the time and to make the other CPUs to handle the interrupts if
>> >> the current CPU is continuously busy with handling the other CPUs IO
>> >> interrupts.
>> >>
>> >> For example consider a system which has 8 logical CPUs and one MSIx
>> >> vector enabled (called IRQ 120) in driver, HBA queue depth as 8K.
>> >> then IRQ's procfs attributes will be
>> >> IRQ# 120, affinity_hint=0xff, smp_affinity=0x00
>> >>
>> >> After starting heavy IOs, we will observe that only CPU0 will be busy
>> >> with handling the interrupts. This experiment driver will change the
>> >> smp_affinity to next CPU number i.e. 0x01 (using cmd 'echo 0x01 >
>> >> /proc/irq/120/smp_affinity', driver issue's this cmd using
>> >> call_usermodehelper() API) if it observes that CPU0 is continuously
>> >> processing more than 2K of IOs replies of other CPUs i.e from CPU1 to
>> >> CPU7.
>> >>
>> >> Whether doing this kind of stuff in driver is ok?
>> >
>> >
>> > Hello Sreekanth,
>> >
>> > To me this sounds like something that should be implemented in the I/O
>> > chipset on the motherboard. If you have a look at the Intel Software
>> > Developer Manuals then you will see that logical destination mode supports
>> > round-robin interrupt delivery. However, the Linux kernel selects physical
>> > destination mode on systems with more than eight logical CPUs (see also
>> > arch/x86/kernel/apic/apic_flat_64.c).
>> >
>> > I'm not sure the maintainers of the interrupt subsystem would welcome code
>> > that emulates round-robin interrupt delivery. So your best option is
>> > probably to minimize the amount of work that is done in interrupt context
>> > and to move as much work as possible out of interrupt context in such a way
>> > that it can be spread over multiple CPU cores, e.g. by using
>> > queue_work_on().
>> >
>> > Bart.
>>
>> Bart,
>>
>> Thanks a lot for providing lot of inputs and valuable information on this issue.
>>
>> Today I got one more observation. i.e. I am not observing any lockups
>> if I use 1.0.4-6 versioned irqbalance.
>> Since this versioned irqbalance is able to shift the load to other CPU
>> when one CPU is heavily loaded.
>>
>
> This isn't happening because irqbalance is no longer able to shift load between
> cpus, its happening because of commit 996ee2cf7a4d10454de68ac4978adb5cf22850f8.
> irqs with higher interrupt volumes sould be balanced to a specific cpu core,
> rather than to a cache domain to maximize cpu-local cache hit rates.  Prior to
> that change we balanced to a cache domain and your workload didn't have to
> serialize multiple interrupts to a single core.  My suggestion to you is to use
> the --policyscript option to make your storage irqs get balanced to the cache
> level, rather than the core level.  That should return the behavior to what you
> want.
>
> Neil

Hi Neil,

Thanks for reply.

Today I tried with setting balance_level to 'cache' for mpt3sas driver
IRQ's using below policy script and used 1.0.9 versioned irqbalance,
----------------------------------------------------------------------------------------------
#!/bin/bash
# Header
# Linux Shell Scripting for Irq Balance Policy select for mpt3sas driver
#

# Command Line Args
 #IRQ_PATH    -> PATH
 #IRQ_NUMBER     -> IRQ Number
declare -r IRQ_PATH=$1
declare -r IRQ_NUMBER=$2

if [ -d /proc/irq/$IRQ_NUMBER ]; then
        mpt3sas_irq=(`ls /proc/irq/$IRQ_NUMBER/ | grep mpt3sas | wc -l`)
        if [ $mpt3sas_irq == 1 ]; then
            echo "hintpolicy=subset"
            echo "balance_level=cache"
    fi
fi
-----------------------------------------------------------------------------------------------

But still I don't see any load shift happening between the CPUs and
still observing hardlockups.

Here I have attached the irqbalance logs.

Thanks,
Sreekanth
>
>> while running heavy IOs, for first few seconds here is my driver irq's
>> attributes,
>> --------------------------------------------------------------------------------------------------------------------
>> ioc number = 0
>> number of core processors = 24
>> msix vector count = 2
>> number of cores per msix vector = 16
>>
>>
>>     msix index = 0, irq number =  50, smp_affinity = 000040
>> affinity_hint = 000fff
>>     msix index = 1, irq number =  51, smp_affinity = 001000
>> affinity_hint = fff000
>>
>> We have set affinity for 2 msix vectors and 24 core processors
>> ----------------------------------------------------------------------------------------------------------------------
>>
>> After few seconds it observed that CPU12 is heavily loaded for IRQ 51
>> and it changed the smp_affinity to CPU21
>> --------------------------------------------------------------------------------------------------------------------
>> ioc number = 0
>> number of core processors = 24
>> msix vector count = 2
>> number of cores per msix vector = 16
>>
>>
>>     msix index = 0, irq number =  50, smp_affinity = 000040
>> affinity_hint = 000fff
>>     msix index = 1, irq number =  51, smp_affinity = 200000
>> affinity_hint = fff000
>>
>> We have set affinity for 2 msix vectors and 24 core processors
>> ---------------------------------------------------------------------------------------------------------------------
>>
>> Where as irqblanance versioned 1.0.9 is not able to shift the load to
>> the other CPUs enabled in the affinity_hint (even when subset policy
>> is enabled) and so I was observing the softlocks/hardlockups.
>>
>> Here I have attached irqbalance logs with debug enabled for both versions.
>>
>> Thanks,
>> Sreekanth
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: irqbalance_1.0.9_with_set_policy_logs
Type: application/octet-stream
Size: 348160 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/irqbalance/attachments/20160907/c7ce6afa/attachment-0001.obj>