On 6/30/21 8:59 PM, Bjorn Helgaas wrote: > [+cc Michal, Jingoo, Thierry, Jonathan] [snip] > > I think the above commit log is perfectly accurate, but all the > details might suggest that this is something specific to rockchip or > CONFIG_DEBUG_SHIRQ, which it isn't, and they might obscure the > fundamental problem, which is actually very simple: we registered IRQ > handlers before we were ready for them to be called. > > I propose the following commit log in the hope that it would help > other driver authors to make similar fixes: > > PCI: rockchip: Register IRQ handlers after device and data are ready > > An IRQ handler may be called at any time after it is registered, so > anything it relies on must be ready before registration. > > rockchip_pcie_subsys_irq_handler() and rockchip_pcie_client_irq_handler() > read registers in the PCIe controller, but we registered them before > turning on clocks to the controller. If either is called before the clocks > are turned on, the register reads fail and the machine hangs. > > Similarly, rockchip_pcie_legacy_int_handler() uses rockchip->irq_domain, > but we installed it before initializing irq_domain. > > Register IRQ handlers after their data structures are initialized and > clocks are enabled. > > If this is inaccurate or omits something important, let me know. I > can make any updates locally. > I think your description is accurate and agree that the commit message may be misleading. As you said, this is a general problem and the fact that an IRQ is shared and CONFIG_DEBUG_SHIRQ fires a spurious interrupt just make the assumptions in the driver to fall apart. But maybe you can also add a paragraph that mentions the CONFIG_DEBUG_SHIRQ option and shared interrupts? That way, other driver authors could know that by enabling this an underlying problem might be exposed for them to fix. Best regards, -- Javier Martinez Canillas Software Engineer New Platform Technologies Enablement team RHEL Engineering