Thomas, Thanks a lot for the detailed answer. > > Basically, 32 MSI vectors are represented by a single GIC irq. > > There's a status registers which every bit correspond to an MSI vector, and > > individual MSI needs to be acked on that registers. in any case where > > there's asserted bit the GIC IRQ level is high. > > Which is not that bad. > > >> > It's configured with handle_level_irq() as the GIC is level IRQ. > >> > >> Which is completely bonkers. MSI has edge semantics and sharing an > >> interrupt line for edge type interrupts is broken by design, unless the > >> hardware which handles the incoming MSIs and forwards them to the level > >> type interrupt line is designed properly and the driver does the right > >> thing. > > > > Yes, the design of the HW is sort of broken. I concur. > > As you describe it, it's not that bad. > > >> > The ack callback acks the GIC irq. the mask/unmask calls > >> > pci_msi_mask_irq() / pci_msi_unmask_irq() > >> > >> What? How is that supposed to work with multiple MSIs? > > Acking is per MSI vector as I described above, so it should work. > > No. This is the wrong approach. Lets look at the hardware: > > | GIC line X |------| MSI block | <--- Messages from devices > > The MSI block latches the incoming message up to the point where it is > acknowledged in the MSI block. This makes sure that the level semantics > of the GIC are met. > > So from a software perspective you want to do something like this: > > gic_irq_handler() > { > mask_ack(gic_irqX); > > pending = read(msi_status); > for_each_bit(bit, pending) { > ack(msi_status, bit); // This clears the latch in the MSI block > handle_irq(irqof(bit)); > } > unmask(gic_irqX); > } > > And that works perfectly correct without masking the MSI interrupt at > the PCI level for a threaded handler simply because the PCI device will > not send another interrupt until the previous one has been handled by > the driver unless the PCI device is broken. I'm missing something here, isn't this implementation blocks IRQ's only during the HW handler and not during the threaded handler ? (Assuming that I selected handle_level_irq() as the default handler) Actually my implementation current implementation is very similar to what you just described: static void eq_msi_isr(struct irq_desc *desc) { struct irq_chip *chip = irq_desc_get_chip(desc); struct eq_msi *msi; u16 status; unsigned long bitmap; u32 bit; u32 virq; chained_irq_enter(chip, desc); msi = irq_desc_get_handler_data(desc); while ((status = readw(msi->gcsr_regs_base + LINK_GCSR5_OFFSET) & MSI_IRQ_REQ) != 0) { pr_debug("MSI: %x\n", status >> 12); bitmap = status >> 12; for_each_set_bit(bit, &bitmap, msi->num_of_vectors) { virq = irq_find_mapping(msi->inner_domain, bit); if (virq) { generic_handle_irq(virq); } else { pr_err("unexpected MSI\n"); handle_bad_irq(desc); } } } chained_irq_exit(chip, desc); } Additionally the domain allocation is defined like: static int eq_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *args) { struct eq_msi *msi = domain->host_data; unsigned long bit; u32 mask; /* We only allow 32 MSI per device */ WARN_ON(nr_irqs > 32); if (nr_irqs > 32) return -ENOSPC; bit = find_first_zero_bit(msi->used, msi->num_of_vectors); if (bit >= msi->num_of_vectors) return -ENOSPC; set_bit(bit, msi->used); mask = readw(msi->gcsr_regs_base + LINK_GCSR6_OFFSET); mask |= BIT(bit) << 12; writew(mask, msi->gcsr_regs_base + LINK_GCSR6_OFFSET); irq_domain_set_info(domain, virq, bit, &eq_msi_bottom_irq_chip, domain->host_data, handle_level_irq, NULL, NULL); pr_debug("Enabling MSI irq: %lu\n", bit); return 0; } > > If that's the case, then you have to work around that at the device > driver level, not at the irq chip level, by installing a primary handler > which quiesces the device (not the MSI part). > > Thanks, > > tglx