On 2020-05-27 11:06, Christoph Hellwig wrote: > this series ensures I/O is quiesced before a cpu and thus the managed > interrupt handler is shut down. > > This patchset tries to address the issue by the following approach: > > - before the last cpu in hctx->cpumask is going to offline, mark this > hctx as inactive > > - disable preempt during allocating tag for request, and after tag is > allocated, check if this hctx is inactive. If yes, give up the > allocation and try remote allocation from online CPUs > > - before hctx becomes inactive, drain all allocated requests on this > hctx > > The guts of the changes are from Ming Lei, I just did a bunch of prep > cleanups so that they can fit in more nicely. The series also depends > on my "avoid a few q_usage_counter roundtrips v3" series. > > Thanks John Garry for running lots of tests on arm64 with this previous > version patches and co-working on investigating all kinds of issues. Hi Christoph, Thanks for having prepared and posted this new patch series. After v3 was posted and before v4 was posted I had a closer look at the IRQ core. My conclusions (which may be incorrect) are as follows: * The only function that sets the 'is_managed' member of struct irq_affinity_desc to 1 is irq_create_affinity_masks(). * There are two ways to cause that function to be called: setting the PCI_IRQ_AFFINITY flag when calling pci_alloc_irq_vectors_affinity() or passing the 'affd' argument. pci_alloc_irq_vectors() calls pci_alloc_irq_vectors_affinity(). * The following drivers pass an affinity domain argument when allocating interrupts: virtio_blk, nvme, be2iscsi, csiostor, hisi_sas, megaraid, mpt3sas, qla2xxx, virtio_scsi. * The following drivers set the PCI_IRQ_AFFINITY flag but do not pass an affinity domain: aacraid, hpsa, lpfc, smartqpi, virtio_pci_common. What is not clear to me is why managed interrupts are shut down if the last CPU in their affinity mask is shut down? Has it been considered to modify the IRQ core such that managed PCIe interrupts are assigned to another CPU if the last CPU in their affinity mask is shut down? Would that make it unnecessary to drain hardware queues during CPU hotplugging? Or is there perhaps something in the PCI or PCIe specifications or in one of the architectures supported by Linux that prevents doing this? Is this the commit that introduced shutdown of managed interrupts: c5cb83bb337c ("genirq/cpuhotplug: Handle managed IRQs on CPU hotplug")? Some of my knowledge about non-managed and managed interrupts comes from https://lore.kernel.org/lkml/alpine.DEB.2.20.1710162106400.2037@nanos/ Thanks, Bart.