On Fri, Jan 28, 2022 at 08:57:16AM +0000, Marc Zyngier wrote: > On Thu, 27 Jan 2022 21:21:00 +0000, > Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Wed, Jan 26, 2022 at 11:37:58AM +0800, qizhong.cheng wrote: > > > On Tue, 2022-01-25 at 17:21 +0000, Marc Zyngier wrote: > > > > On 2022-01-25 16:57, Bjorn Helgaas wrote: > > > > > On Sun, Jan 23, 2022 at 11:33:06AM +0800, qizhong cheng wrote: > > > > > > As an edge-triggered interrupts, its interrupt status should > > > > > > be cleared before dispatch to the handler of device. > > > > > > > > > > I'm not an IRQ expert, but the reasoning that "we should clear > > > > > the MSI interrupt status before dispatching the handler because > > > > > MSI is an edge-triggered interrupt" doesn't seem completely > > > > > convincing because your code will now look like this: > > > > > > > > > > /* Clear the INTx */ > > > > > writel(1 << bit, port->base + PCIE_INT_STATUS); > > > > > generic_handle_domain_irq(port->irq_domain, bit - INTX_SHIFT); > > > > > ... > > > > > > > > > > /* Clear MSI interrupt status */ > > > > > writel(MSI_STATUS, port->base + PCIE_INT_STATUS); > > > > > generic_handle_domain_irq(port->inner_domain, bit); > > > > > > > > > > You clear interrupt status before dispatching the handler for > > > > > *both* level-triggered INTx interrupts and edge-triggered MSI > > > > > interrupts. > > > > > > > > > > So it doesn't seem that simply being edge-triggered is the > > > > > critical factor here. > > > > > > > > This is the usual problem with these half-baked implementations. > > > > The signalling to the primary interrupt controller is level, as > > > > they take a multitude of input and (crucially) latch the MSI > > > > edges. Effectively, this is an edge-to-level converter, with all > > > > the problems that this creates. > > > > > > > > By clearing the status *after* the handling, you lose edges that > > > > have been received and coalesced after the read of the status > > > > register. By clearing it *before*, you are acknowledging the > > > > interrupts early, and allowing them to be coalesced independently > > > > of the ones that have been received earlier. > > > > > > > > This is however mostly an educated guess. Someone with access to > > > > the TRM should verify this. > > > > > > Yes, as Maz said, we save the edge-interrupt status so that it > > > becomes a level-interrupt. This is similar to an edge-to-level > > > converter, so we need to clear it *before*. We found this problem > > > through a lot of experiments and tested this patch. > > > > I thought there might be other host controllers with similar design, > > so I looked at all the other drivers and tried to figure out whether > > any others had similar problems. > > > > The ones below look suspicious to me because they all clear some sort > > of status register *after* handling an MSI. Can you guys take a look > > and make sure they are working correctly? > > > > keembay_pcie_msi_irq_handler > > status = readl(pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS) > > if (status & MSI_CTRL_INT) > > dw_handle_msi_irq > > generic_handle_domain_irq > > writel(status, pcie->apb_base + PCIE_REGS_INTERRUPT_STATUS) > > > > spear13xx_pcie_irq_handler > > status = readl(&app_reg->int_sts) > > if (status & MSI_CTRL_INT) > > dw_handle_msi_irq > > generic_handle_domain_irq > > writel(status, &app_reg->int_clr) > > I think these two are fine. > > The top level interrupt is only a level signal that the is something > to process. The only thing that is unclear is what the effect of > writing to that status register if MSIs are pending at that point. A > sane implementation would just ignore the write. > > The actual processing is done in dw_handle_msi_irq(), reading the > PCIE_MSI_INTR0_STATUS register. This same register is then used to Ack > the interrupt, one bit at a time, as interrupts are handled (see > dw_pci_bottom_ack). Ack taking place before the handling, it makes it > safe for edge delivery. > > > advk_pcie_handle_int > > isr0_status = advk_readl(pcie, PCIE_ISR0_REG) > > if (isr0_status & PCIE_ISR0_MSI_INT_PENDING) > > advk_pcie_handle_msi > > advk_readl(pcie, PCIE_MSI_STATUS_REG) > > advk_writel(pcie, BIT(msi_idx), PCIE_MSI_STATUS_REG) > > generic_handle_irq > > advk_writel(pcie, PCIE_ISR0_MSI_INT_PENDING, PCIE_ISR0_REG) > > Same thing, I guess. It is just that the Ack has been open-coded. > > > mtk_pcie_irq_handler > > status = readl_relaxed(pcie->base + PCIE_INT_STATUS_REG) > > for_each_set_bit_from(irq_bit, &status, ...) > > mtk_pcie_msi_handler > > generic_handle_domain_irq > > writel_relaxed(BIT(irq_bit), pcie->base + PCIE_INT_STATUS_REG) > > Similar thing. The PCIE_MSI_SET_STATUS register is read first, and > then written back in the ack callback. Thanks a lot for taking a look at these, Marc! Is there anything we can do to make all these drivers/pci/controller/* drivers more consistent and easier to review? I found it very difficult to look across all of them and find similar design patterns. Bjorn