On Wed, Jan 16, 2019 at 06:46:32AM +0800, Ming Lei wrote: > Hi Bjorn, > > I think Christoph and Jens are correct, we should make this patch into > 5.0 because the issue is triggered since 3b6592f70ad7b4c2 ("nvme: utilize > two queue maps, one for reads and one for writes"), which is merged to > 5.0-rc. > > For example, before 3b6592f70ad7b4c2, one nvme controller may be > allocated 64 irq vectors; but after that commit, only 1 irq vector > is assigned to this controller. > > On Tue, Jan 15, 2019 at 01:31:35PM -0600, Bjorn Helgaas wrote: > > On Tue, Jan 15, 2019 at 09:22:45AM -0700, Jens Axboe wrote: > > > On 1/15/19 6:11 AM, Christoph Hellwig wrote: > > > > On Mon, Jan 14, 2019 at 05:23:39PM -0600, Bjorn Helgaas wrote: > > > >> Applied to pci/msi for v5.1, thanks! > > > >> > > > >> If this is something that should be in v5.0, let me know and include the > > > >> justification, e.g., something we already merged for v5.0 or regression > > > >> info, etc, and a Fixes: line, and I'll move it to for-linus. > > > > > > > > I'd be tempted to queues this up for 5.0. Ming, what is your position? > > > > > > I think we should - the API was introduced in this series, I think there's > > > little (to no) reason NOT to fix it for 5.0. > > > > I'm guessing the justification goes something like this (I haven't > > done all the research, so I'll leave it to Ming to fill in the details): > > > > pci_alloc_irq_vectors_affinity() was added in v4.x by XXXX ("..."). > > dca51e7892fa3b ("nvme: switch to use pci_alloc_irq_vectors") > > > It had this return value defect then, but its min_vecs/max_vecs > > parameters removed the need for callers to interatively reduce the > > number of IRQs requested and retry the allocation, so they didn't > > need to distinguish -ENOSPC from -EINVAL. > > > > In v5.0, XXX ("...") added IRQ sets to the interface, which > > 3b6592f70ad7b4c2 ("nvme: utilize two queue maps, one for reads and one for writes") > > > reintroduced the need to check for -ENOSPC and possibly reduce the > > number of IRQs requested and retry the allocation. We're fixing a PCI core defect, so we should mention the relevant PCI core commits, not the nvme-specific ones. I looked them up for you and moved this to for-linus for v5.0. commit 77f88abd4a6f73a1a68dbdc0e3f21575fd508fc3 Author: Ming Lei <ming.lei@xxxxxxxxxx> Date: Tue Jan 15 17:31:29 2019 -0600 PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity() The API of pci_alloc_irq_vectors_affinity() says it returns -ENOSPC if fewer than @min_vecs interrupt vectors are available for @dev. However, if a device supports MSI-X but not MSI and a caller requests @min_vecs that can't be satisfied by MSI-X, we previously returned -EINVAL (from the failed attempt to enable MSI), not -ENOSPC. When -ENOSPC is returned, callers may reduce the number IRQs they request and try again. Most callers can use the @min_vecs and @max_vecs parameters to avoid this retry loop, but that doesn't work when using IRQ affinity "nr_sets" because rebalancing the sets is driver-specific. This return value bug has been present since pci_alloc_irq_vectors() was added in v4.10 by aff171641d18 ("PCI: Provide sensible IRQ vector alloc/free routines"), but it wasn't an issue because @min_vecs/@max_vecs removed the need for callers to iteratively reduce the number of IRQs requested and retry the allocation, so they didn't need to distinguish -ENOSPC from -EINVAL. In v5.0, 6da4b3ab9a6e ("genirq/affinity: Add support for allocating interrupt sets") added IRQ sets to the interface, which reintroduced the need to check for -ENOSPC and possibly reduce the number of IRQs requested and retry the allocation. Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Cc: Jens Axboe <axboe@xxxxxx> Cc: Keith Busch <keith.busch@xxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxx> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 7a1c8a09efa5..4c0b47867258 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1168,7 +1168,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, const struct irq_affinity *affd) { static const struct irq_affinity msi_default_affd; - int vecs = -ENOSPC; + int msix_vecs = -ENOSPC; + int msi_vecs = -ENOSPC; if (flags & PCI_IRQ_AFFINITY) { if (!affd) @@ -1179,16 +1180,17 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, } if (flags & PCI_IRQ_MSIX) { - vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs, - affd); - if (vecs > 0) - return vecs; + msix_vecs = __pci_enable_msix_range(dev, NULL, min_vecs, + max_vecs, affd); + if (msix_vecs > 0) + return msix_vecs; } if (flags & PCI_IRQ_MSI) { - vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd); - if (vecs > 0) - return vecs; + msi_vecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, + affd); + if (msi_vecs > 0) + return msi_vecs; } /* use legacy irq if allowed */ @@ -1199,7 +1201,9 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, } } - return vecs; + if (msix_vecs == -ENOSPC) + return -ENOSPC; + return msi_vecs; } EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);