On 2/12/21 5:41 PM, Greg Kurz wrote: > Depending on the number of online CPUs in the original kernel, it is > likely for CPU #0 to be offline in a kdump kernel. The associated IRQs > in the affinity mappings provided by irq_create_affinity_masks() are > thus not started by irq_startup(), as per-design with managed IRQs. > > This can be a problem with multi-queue block devices driven by blk-mq : > such a non-started IRQ is very likely paired with the single queue > enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This > causes the device to remain silent and likely hangs the guest at > some point. > > This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries: > Pass MSI affinity to irq_create_mapping()"). Note that this only happens > with the XIVE interrupt controller because XICS has a workaround to bypass > affinity, which is activated during kdump with the "noirqdistrib" kernel > parameter. > > The issue comes from a combination of factors: > - discrepancy between the number of queues detected by the multi-queue > block driver, that was used to create the MSI vectors, and the single > queue mode enforced later on by blk-mq because of kdump (i.e. keeping > all queues fixes the issue) > - CPU#0 offline (i.e. kdump always succeed with CPU#0) > > Given that I couldn't reproduce on x86, which seems to always have CPU#0 > online even during kdump, I'm not sure where this should be fixed. Hence > going for another approach : fine-grained affinity is for performance > and we don't really care about that during kdump. Simply revert to the > previous working behavior of ignoring affinity masks in this case only. > > Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()") > Cc: lvivier@xxxxxxxxxx > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Greg Kurz <groug@xxxxxxxx> Reviewed-by: Cédric Le Goater <clg@xxxxxxxx> Thanks for tracking this issue. This layer needs a rework. Patches adding a MSI domain should be ready in a couple of releases. Hopefully. C. > --- > arch/powerpc/platforms/pseries/msi.c | 24 ++++++++++++++++++++++-- > 1 file changed, 22 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c > index b3ac2455faad..29d04b83288d 100644 > --- a/arch/powerpc/platforms/pseries/msi.c > +++ b/arch/powerpc/platforms/pseries/msi.c > @@ -458,8 +458,28 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type) > return hwirq; > } > > - virq = irq_create_mapping_affinity(NULL, hwirq, > - entry->affinity); > + /* > + * Depending on the number of online CPUs in the original > + * kernel, it is likely for CPU #0 to be offline in a kdump > + * kernel. The associated IRQs in the affinity mappings > + * provided by irq_create_affinity_masks() are thus not > + * started by irq_startup(), as per-design for managed IRQs. > + * This can be a problem with multi-queue block devices driven > + * by blk-mq : such a non-started IRQ is very likely paired > + * with the single queue enforced by blk-mq during kdump (see > + * blk_mq_alloc_tag_set()). This causes the device to remain > + * silent and likely hangs the guest at some point. > + * > + * We don't really care for fine-grained affinity when doing > + * kdump actually : simply ignore the pre-computed affinity > + * masks in this case and let the default mask with all CPUs > + * be used when creating the IRQ mappings. > + */ > + if (is_kdump_kernel()) > + virq = irq_create_mapping(NULL, hwirq); > + else > + virq = irq_create_mapping_affinity(NULL, hwirq, > + entry->affinity); > > if (!virq) { > pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq); >