On Tue, Jun 29, 2021 at 11:52:44AM +0100, Robin Murphy wrote: > On 2021-06-29 07:17, Javier Martinez Canillas wrote: > > On 6/29/21 2:38 AM, Bjorn Helgaas wrote: > > > On Thu, Jun 24, 2021 at 05:40:40PM -0500, Bjorn Helgaas wrote: > > > > [snip] > > > > > > > > > > > > So let's just move all the IRQ init before the pci_host_probe() call, that > > > > > will prevent issues like this and seems to be the correct thing to do too. > > > > > > > > Previously we registered rockchip_pcie_subsys_irq_handler() and > > > > rockchip_pcie_client_irq_handler() before the PCIe clocks were > > > > enabled. That's a problem because they depend on those clocks being > > > > enabled, and your patch fixes that. > > > > > > > > rockchip_pcie_legacy_int_handler() depends on rockchip->irq_domain, > > > > which isn't initialized until rockchip_pcie_init_irq_domain(). > > > > Previously we registered rockchip_pcie_legacy_int_handler() as the > > > > handler for the "legacy" IRQ before rockchip_pcie_init_irq_domain(). > > > > > > > > I think your patch *also* fixes that problem, right? > > > > > > The lack of consistency in how we use > > > irq_set_chained_handler_and_data() really bugs me. > > > > > > Your patch fixes the ordering issue where we installed > > > rockchip_pcie_legacy_int_handler() before initializing data > > > (rockchip->irq_domain) that it depends on. > > > > > > But AFAICT, rockchip still has the problem that we don't *unregister* > > > rockchip_pcie_legacy_int_handler() when the rockchip-pcie module is > > > removed. Doesn't this mean that if we unload the module, then receive > > > an interrupt from the device, we'll try to call a function that is no > > > longer present? > > > > Good question, I don't to be honest. I'll have to dig deeper on this but > > my experience is that the module removal (and device unbind) is not that > > well tested on ARM device drivers in general. > > Well, it does use devm_request_irq() so the handler should be unregistered > by devres *after* ->remove has finished, however that does still leave a > potential race window in which a pending IRQ could be taken during the later > part of rockchip_pcie_remove() after it has started turning off critical > things. Unless the clocks and regulators can also be delegated to devres, it > might be more robust to explicitly manage the IRQs as well. Mixing the two > schemes can be problematic when the exact order of both setup and teardown > matters. I don't understand the devm_request_irq() connection. I'm looking at this irq_set_chained_handler_and_data() call [1]: static int rockchip_pcie_setup_irq(struct rockchip_pcie *rockchip) { ... irq = platform_get_irq_byname(pdev, "legacy"); irq_set_chained_handler_and_data(irq, rockchip_pcie_legacy_int_handler, rockchip); irq = platform_get_irq_byname(pdev, "client"); ... We look up "irq", pass it to irq_set_chained_handler_and_data(), and throw it away without saving it anywhere. How would anything know how to unregister rockchip_pcie_legacy_int_handler()? I could imagine irq_set_chained_handler_and_data() saving what's needed for unregistration, but I would think that would require a device pointer, which we don't give it. I'm IRQ-illiterate, so please educate me! Bjorn [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-rockchip-host.c?id=v5.13#n562