On 2021-06-30 00:14, Bjorn Helgaas wrote:
On Tue, Jun 29, 2021 at 11:52:44AM +0100, Robin Murphy wrote:
On 2021-06-29 07:17, Javier Martinez Canillas wrote:
On 6/29/21 2:38 AM, Bjorn Helgaas wrote:
On Thu, Jun 24, 2021 at 05:40:40PM -0500, Bjorn Helgaas wrote:
[snip]
So let's just move all the IRQ init before the pci_host_probe() call, that
will prevent issues like this and seems to be the correct thing to do too.
Previously we registered rockchip_pcie_subsys_irq_handler() and
rockchip_pcie_client_irq_handler() before the PCIe clocks were
enabled. That's a problem because they depend on those clocks being
enabled, and your patch fixes that.
rockchip_pcie_legacy_int_handler() depends on rockchip->irq_domain,
which isn't initialized until rockchip_pcie_init_irq_domain().
Previously we registered rockchip_pcie_legacy_int_handler() as the
handler for the "legacy" IRQ before rockchip_pcie_init_irq_domain().
I think your patch *also* fixes that problem, right?
The lack of consistency in how we use
irq_set_chained_handler_and_data() really bugs me.
Your patch fixes the ordering issue where we installed
rockchip_pcie_legacy_int_handler() before initializing data
(rockchip->irq_domain) that it depends on.
But AFAICT, rockchip still has the problem that we don't *unregister*
rockchip_pcie_legacy_int_handler() when the rockchip-pcie module is
removed. Doesn't this mean that if we unload the module, then receive
an interrupt from the device, we'll try to call a function that is no
longer present?
Good question, I don't to be honest. I'll have to dig deeper on this but
my experience is that the module removal (and device unbind) is not that
well tested on ARM device drivers in general.
Well, it does use devm_request_irq() so the handler should be unregistered
by devres *after* ->remove has finished, however that does still leave a
potential race window in which a pending IRQ could be taken during the later
part of rockchip_pcie_remove() after it has started turning off critical
things. Unless the clocks and regulators can also be delegated to devres, it
might be more robust to explicitly manage the IRQs as well. Mixing the two
schemes can be problematic when the exact order of both setup and teardown
matters.
I don't understand the devm_request_irq() connection. I'm looking at
this irq_set_chained_handler_and_data() call [1]:
static int rockchip_pcie_setup_irq(struct rockchip_pcie *rockchip)
{
...
irq = platform_get_irq_byname(pdev, "legacy");
irq_set_chained_handler_and_data(irq,
rockchip_pcie_legacy_int_handler,
rockchip);
irq = platform_get_irq_byname(pdev, "client");
...
We look up "irq", pass it to irq_set_chained_handler_and_data(), and
throw it away without saving it anywhere. How would anything know how
to unregister rockchip_pcie_legacy_int_handler()?
I could imagine irq_set_chained_handler_and_data() saving what's
needed for unregistration, but I would think that would require a
device pointer, which we don't give it.
I'm IRQ-illiterate, so please educate me!
Oh, my mistake, I was looking at the "sys" and "client" IRQs and spoke
too soon; the "legacy" IRQ does appear to be completely leaked as you
say, so there are two degrees of issue here.
Robin.
Bjorn
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-rockchip-host.c?id=v5.13#n562