On 09/13/2017 09:33 AM, Thomas Gleixner wrote: > On Wed, 13 Sep 2017, Kashyap Desai wrote: >>> On 09/12/2017 08:15 PM, YASUAKI ISHIMATSU wrote: >>>> + linux-scsi and maintainers of megasas > >>>>> In my server, IRQ#66-89 are sent to CPU#24-29. And if I offline >>>>> CPU#24-29, I/O does not work, showing the following messages. > > .... > >>> This indeed looks like a problem. >>> We're going to great lengths to submit and complete I/O on the same CPU, >>> so >>> if the CPU is offlined while I/O is in flight we won't be getting a >>> completion for >>> this particular I/O. >>> However, the megasas driver should be able to cope with this situation; >>> after >>> all, the firmware maintains completions queues, so it would be dead easy >>> to >>> look at _other_ completions queues, too, if a timeout occurs. >> In case of IO timeout, megaraid_sas driver is checking other queues as well. >> That is why IO was completed in this case and further IOs were resumed. >> >> Driver complete commands as below code executed from >> megasas_wait_for_outstanding_fusion(). >> for (MSIxIndex = 0 ; MSIxIndex < count; MSIxIndex++) >> complete_cmd_fusion(instance, MSIxIndex); >> >> Because of above code executed in driver, we see only one print as below in >> this logs. >> megaraid_sas 0000:02:00.0: [ 0]waiting for 2 commands to complete for scsi0 >> >> As per below link CPU hotplug will take care- "All interrupts targeted to >> this CPU are migrated to a new CPU" >> https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html >> >> BTW - We are also able reproduce this issue locally. Reason for IO timeout >> is -" IO is completed, but corresponding interrupt did not arrived on Online >> CPU. Either missed due to CPU is in transient state of being OFFLINED. I am >> not sure which component should take care this." >> >> Question - "what happens once __cpu_disable is called and some of the queued >> interrupt has affinity to that particular CPU ?" >> I assume ideally those pending/queued Interrupt should be migrated to >> remaining online CPUs. It should not be unhandled if we want to avoid such >> IO timeout. > > Can you please provide the following information, before and after > offlining the last CPU in the affinity set: > > # cat /proc/irq/$IRQNUM/smp_affinity_list > # cat /proc/irq/$IRQNUM/effective_affinity > # cat /sys/kernel/debug/irq/irqs/$IRQNUM > > The last one requires: CONFIG_GENERIC_IRQ_DEBUGFS=y Here are one irq's info of megasas: - Before offline CPU /proc/irq/70/smp_affinity_list 24-29 /proc/irq/70/effective_affinity 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,3f000000 /sys/kernel/debug/irq/irqs/70 handler: handle_edge_irq status: 0x00004000 istate: 0x00000000 ddepth: 0 wdepth: 0 dstate: 0x00609200 IRQD_ACTIVATED IRQD_IRQ_STARTED IRQD_MOVE_PCNTXT IRQD_AFFINITY_SET IRQD_AFFINITY_MANAGED node: 1 affinity: 24-29 effectiv: 24-29 pending: domain: INTEL-IR-MSI-0-2 hwirq: 0x100018 chip: IR-PCI-MSI flags: 0x10 IRQCHIP_SKIP_SET_WAKE parent: domain: INTEL-IR-0 hwirq: 0x400000 chip: INTEL-IR flags: 0x0 parent: domain: VECTOR hwirq: 0x46 chip: APIC flags: 0x0 - After offline CPU#24-29 /proc/irq/70/smp_affinity_list 29 /proc/irq/70/effective_affinity 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,20000000 /sys/kernel/debug/irq/irqs/70 handler: handle_edge_irq status: 0x00004000 istate: 0x00000000 ddepth: 1 wdepth: 0 dstate: 0x00a39000 IRQD_IRQ_DISABLED IRQD_IRQ_MASKED IRQD_MOVE_PCNTXT IRQD_AFFINITY_SET IRQD_AFFINITY_MANAGED IRQD_MANAGED_SHUTDOWN node: 1 affinity: 29 effectiv: 29 pending: domain: INTEL-IR-MSI-0-2 hwirq: 0x100018 chip: IR-PCI-MSI flags: 0x10 IRQCHIP_SKIP_SET_WAKE parent: domain: INTEL-IR-0 hwirq: 0x400000 chip: INTEL-IR flags: 0x0 parent: domain: VECTOR hwirq: 0x46 chip: APIC flags: 0x0 Thanks, Yasuaki Ishimatsu > > Thanks, > > tglx >