Digging up our patch queue - i found another multi-MSI related fix: Unfortunately the reverse mapping of the hwirq - as made by irq_find_mapping() was not applied to the message data only, but also to the MSI vector, which was lost as a result. Make sure that the reverse mapping is applied to the proper hwirq, contained in the message data. Tested on Saber2 and Katana2. Fixes: fc54bae288182 ("PCI: iproc: Allow allocation of multiple MSIs") diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c index 990fc906d73d..708fdb1065f8 100644 --- drivers/pci/host/pcie-iproc-msi.c +++ drivers/pci/host/pcie-iproc-msi.c @@ -237,7 +237,12 @@ static void iproc_msi_irq_compose_msi_msg(struct irq_data *data, addr = msi->msi_addr + iproc_msi_addr_offset(msi, data->hwirq); msg->address_lo = lower_32_bits(addr); msg->address_hi = upper_32_bits(addr); - msg->data = data->hwirq << 5; + /* + * Since we have multiple hwirq mapped to a single MSI vector, + * now we need to derive the hwirq at CPU0. It can then be used to + * mapped back to virq. + */ + msg->data = hwirq_to_canonical_hwirq(msi, data->hwirq) << 5; } static struct irq_chip iproc_msi_bottom_irq_chip = { @@ -307,14 +312,8 @@ static inline u32 decode_msi_hwirq(struct iproc_msi *msi, u32 eq, u32 head) offs = iproc_msi_eq_offset(msi, eq) + head * sizeof(u32); msg = (u32 *)(msi->eq_cpu + offs); hwirq = readl(msg); - hwirq = (hwirq >> 5) + (hwirq & 0x1f); - /* - * Since we have multiple hwirq mapped to a single MSI vector, - * now we need to derive the hwirq at CPU0. It can then be used to - * mapped back to virq. - */ - return hwirq_to_canonical_hwirq(msi, hwirq); + return hwirq; } static void iproc_msi_handler(struct irq_desc *desc) @@ -360,7 +359,7 @@ static void iproc_msi_handler(struct irq_desc *desc) /* process all outstanding events */ while (nr_events--) { hwirq = decode_msi_hwirq(msi, eq, head); - virq = irq_find_mapping(msi->inner_domain, hwirq); + virq = irq_find_mapping(msi->inner_domain, hwirq >> 5) + (hwirq & 0x1f); generic_handle_irq(virq); head++; On Thu, May 20, 2021 at 7:11 PM Ray Jui <ray.jui@xxxxxxxxxxxx> wrote: > > > > On 5/20/2021 7:22 AM, Sandor Bodo-Merle wrote: > > On Thu, May 20, 2021 at 4:05 PM Pali Rohár <pali@xxxxxxxxxx> wrote: > >> > >> Hello! > >> > >> On Thursday 20 May 2021 15:47:46 Sandor Bodo-Merle wrote: > >>> Hi Pali, > >>> > >>> thanks for catching this - i dig up the followup fixup commit we have > >>> for the iproc multi MSI (it was sent to Broadcom - but unfortunately > >>> we missed upstreaming it). > >>> > >>> Commit fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs") > >>> failed to reserve the proper number of bits from the inner domain. > >>> We need to allocate the proper amount of bits otherwise the domains for > >>> multiple PCIe endpoints may overlap and freeing one of them will result > >>> in freeing unrelated MSI vectors. > >>> > >>> Fixes: fc54bae28818 ("PCI: iproc: Allow allocation of multiple MSIs") > >>> --- > >>> drivers/pci/host/pcie-iproc-msi.c | 8 ++++---- > >>> 1 file changed, 4 insertions(+), 4 deletions(-) > >>> > >>> diff --git drivers/pci/host/pcie-iproc-msi.c drivers/pci/host/pcie-iproc-msi.c > >>> index 708fdb1065f8..a00492dccb74 100644 > >>> --- drivers/pci/host/pcie-iproc-msi.c > >>> +++ drivers/pci/host/pcie-iproc-msi.c > >>> @@ -260,11 +260,11 @@ static int iproc_msi_irq_domain_alloc(struct > >>> irq_domain *domain, > >>> > >>> mutex_lock(&msi->bitmap_lock); > >>> > >>> - /* Allocate 'nr_cpus' number of MSI vectors each time */ > >>> + /* Allocate 'nr_irqs' multiplied by 'nr_cpus' number of MSI > >>> vectors each time */ > >>> hwirq = bitmap_find_next_zero_area(msi->bitmap, msi->nr_msi_vecs, 0, > >>> - msi->nr_cpus, 0); > >>> + msi->nr_cpus * nr_irqs, 0); > >> > >> I'm not sure if this construction is correct. Multi-MSI interrupts needs > >> to be aligned to number of requested interrupts. So if wifi driver asks > >> for 32 Multi-MSI interrupts then first allocated interrupt number must > >> be dividable by 32. > >> > > > > Ahh - i guess you are right. In our internal engineering we always > > request 32 vectors. > > IIRC the multiply by "nr_irqs" was added for iqr affinity to work correctly. > > > > May I ask which platforms are you guys running this driver on? Cygnus or > Northstar? Not that it matters, but just out of curiosity. > > Let me start by explaining how MSI support works in this driver, or more > precisely, for all platforms that support this iProc based event queue > MSI scheme: > > In iProc PCIe core, each MSI group is serviced by a GIC interrupt > (hwirq) and a dedicated event queue (event queue is paired up with > hwirq). Each MSI group can support up to 64 MSI vectors. Note 64 is the > depth of the event queue. > > The number of MSI groups varies between different iProc SoCs. The total > number of CPU cores also varies. To support MSI IRQ affinity, we > distribute GIC interrupts across all available CPUs. MSI vector is > moved from one GIC interrupt to another to steer to the target CPU. > > Assuming: > The number of MSI groups (the number of total hwirq for this PCIe > controller) is M > The number of CPU cores is N > M is always a multiple of N (we ensured that in the setup function) > > Therefore: > Total number of raw MSI vectors = M * 64 > Total number of supported MSI vectors = (M * 64) / N > > I guess I'm not too clear on what you mean by "multi-MSI interrupts > needs to be aligned to number of requested interrupts.". Would you be > able to plug this into the above explanation so we can have a more clear > understanding of what you mean here? > > In general, I don't see much issue of allocating 'msi->nr_cpus * > nr_irqs' here as long as we can still meet the affinity distribution > requirement as mentioned above. > > Example in the dw PCIe driver does the following for reserving in the > bitmap: > > bitmap_find_free_region(pp->msi_irq_in_use, pp->num_vectors, > order_base_2(nr_irqs)); > > >>> if (hwirq < msi->nr_msi_vecs) { > >>> - bitmap_set(msi->bitmap, hwirq, msi->nr_cpus); > >>> + bitmap_set(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs); > >> > >> And another issue is that only power of 2 interrupts for Multi-MSI can > >> be allocated. Otherwise one number may be allocated to more devices. > >> > >> But I'm not sure how number of CPUs affects it as other PCIe controller > >> drivers do not use number of CPUs. > > As I explained above, we definitely need to take nr_cpus into account, > since we cannot move around the physical interrupt between CPUs. > Instead, we dedicate each physical interrupt to the CPU and service the > MSI across different event queues accordingly when user change irq affinity. > > >> > >> Other drivers are using bitmap_find_free_region() function with > >> order_base_2(nr_irqs) as argument. > >> > > Yes we should do that too. > > >> I hope that somebody else more skilled with MSI interrupts look at these > >> constructions if are correct or needs more rework. > >> > > > > I see the point - i'll ask also in our engineering department ... > > > >>> } else { > >>> mutex_unlock(&msi->bitmap_lock); > >>> return -ENOSPC; > >>> @@ -292,7 +292,7 @@ static void iproc_msi_irq_domain_free(struct > >>> irq_domain *domain, > >>> mutex_lock(&msi->bitmap_lock); > >>> > >>> hwirq = hwirq_to_canonical_hwirq(msi, data->hwirq); > >>> - bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus); > >>> + bitmap_clear(msi->bitmap, hwirq, msi->nr_cpus * nr_irqs); > >>> > >>> mutex_unlock(&msi->bitmap_lock); > >>> > >>> > >>> On Thu, May 20, 2021 at 2:04 PM Pali Rohár <pali@xxxxxxxxxx> wrote: > >>>> > >>>> Hello! > >>>> > >>>> I think there is a bug in pcie-iproc-msi.c driver. It declares > >>>> Multi MSI support via MSI_FLAG_MULTI_PCI_MSI flag, see: > >>>> > >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n174 > >>>> > >>>> but its iproc_msi_irq_domain_alloc() function completely ignores nr_irqs > >>>> argument when allocating interrupt numbers from bitmap, see: > >>>> > >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-iproc-msi.c?h=v5.12#n246 > >>>> > >>>> I think this this is incorrect as alloc callback should allocate nr_irqs > >>>> multi interrupts as caller requested. All other drivers with Multi MSI > >>>> support are doing it. > >>>> > >>>> Could you look at it?