[+to Johan for qcom] [-cc Tom, email bounces] On Thu, Jul 21, 2022 at 10:46:07PM +0200, Pali Rohár wrote: > On Thursday 21 July 2022 14:54:33 Bjorn Helgaas wrote: > > The j721e, kirin, tegra, and mediatek drivers all implement .remove(). > > > > They also set ".suppress_bind_attrs = true". I think this means > > bus_add_driver() will not create the "bind" and "unbind" sysfs > > attributes for the driver that would allow users to users to manually > > attach and detach devices from it. > > > > Is there a reason for this, or should these drivers stop setting > > .suppress_bind_attrs? > > I have already asked this question during review of kirin driver: > https://lore.kernel.org/linux-pci/20211031205527.ochhi72dfu4uidii@pali/ > > Microchip driver wanted to change its type from bool to tristate > https://lore.kernel.org/linux-pci/20220420093449.38054-1-u.kleine-koenig@xxxxxxxxxxxxxx/t/#u > and after discussion it seems that it is needed to do more work for this > driver. > > > For example, Pali and Ley Foon *did* stop setting .suppress_bind_attrs > > when adding .remove() methods in these commits: > > > > 0746ae1be121 ("PCI: mvebu: Add support for compiling driver as module") > > 526a76991b7b ("PCI: aardvark: Implement driver 'remove' function and allow to build it as module") > > ec15c4d0d5d2 ("PCI: altera: Allow building as module") > > I added it for both pci-mvebu.c and pci-aardvark.c. And just few days > ago I realized why suppress_bind_attrs was set to true and remove method > was not implemented. With suppress_bind_attrs, the user can't manually unbind a device, so we can't get to mvebu_pcie_remove() that way, but since mvebu is a modular driver, I assume we can unload the module and *that* would call mvebu_pcie_remove(). Right? > Implementing remove method is not really simple, specially when pci > controller driver implements also interrupt controller (e.g. for > handling legacy interrupts). Hmmm. Based on your patches below, it looks like we need to call irq_dispose_mapping() in some cases, but I'm very confused about *which* cases. I first thought it was for mappings created with irq_create_mapping(), but pci-aardvark.c never calls that, so there must be more to it. Currently only altera, iproc, mediatek-gen3, and mediatek call irq_dispose_mapping() from their .remove() methods. (They all call irq_domain_remove() *before* irq_dispose_mapping(). Is that legal? Your patches do irq_dispose_mapping() *first*.) altera, mediatek-gen3, and mediatek call irq_dispose_mapping() on IRQs that came from platform_get_irq(). qcom is a DWC driver, so all the IRQ stuff happens in dw_pcie_host_init(). qcom_pcie_remove() does call dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody calls irq_dispose_mapping(). I'm thoroughly confused by all this. But I suspect that maybe I should drop the "make qcom modular" patch because it seems susceptible to this problem: https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e > Here are waiting fixup patches for pci-mvebu.c and pci-aardvark.c which > fixes .remove callback. Without these patches calling 'rmmod driver' let > dangling pointer in kernel which may cause random kernel crashes. See: > > https://lore.kernel.org/linux-pci/20220709161858.15031-1-pali@xxxxxxxxxx/ > https://lore.kernel.org/linux-pci/20220711120626.11492-1-pali@xxxxxxxxxx/ > https://lore.kernel.org/linux-pci/20220711120626.11492-2-pali@xxxxxxxxxx/ > > So I would suggest to do more detailed review when adding .remove > callback for pci controller driver (or when remove suppress_bind_attrs) > and do more testings and checking if all IRQ mappings are disposed. I'm not smart enough to do "more detailed review" because I don't know what things to look for :) Thanks for all your work in sorting out these arcane details! Bjorn