On Mon, 25 Jul 2022 14:25:49 +0100, Johan Hovold <johan@xxxxxxxxxx> wrote: > > [ +CC: maz ] > > On Fri, Jul 22, 2022 at 09:38:58AM -0500, Bjorn Helgaas wrote: > > On Fri, Jul 22, 2022 at 03:26:44PM +0200, Johan Hovold wrote: > > > On Thu, Jul 21, 2022 at 05:21:22PM -0500, Bjorn Helgaas wrote: > > > > > > qcom is a DWC driver, so all the IRQ stuff happens in > > > > dw_pcie_host_init(). qcom_pcie_remove() does call > > > > dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody > > > > calls irq_dispose_mapping(). > > > > > > > > I'm thoroughly confused by all this. But I suspect that maybe I > > > > should drop the "make qcom modular" patch because it seems susceptible > > > > to this problem: > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e > > > > > > That should not be necessary. > > > > > > As you note above, interrupt handling is implemented in dwc core so if > > > there are any issue here at all, which I doubt, then all of the dwc > > > drivers that currently can be built as modules would all be broken and > > > this would need to be fixed in core. > > > > I don't know yet whether there's an issue. We need a clear argument > > for why there is or is not. The fact that others might be broken is > > not an argument for breaking another one ;) > > It's not breaking anything that is currently working, and if there's > some corner case during module unload, that's not the end of the world > either. It may not be the end of the world for you, but you have absolutely no idea of what dangling pointers to kernel memory will do on a user machine, nor how this can be further exploited. Unloading a module should never result in an unsafe kernel. > It's a feature useful for developers and no one expects remove code to > be perfect (e.g. resilient against someone trying to break it by doing > things in parallel, etc.). If that's a feature for you while you are developing, then please keep this change as part of your own hacking toolbox. IMO the upstream kernel shouldn't be subjected to this. > > > > I've been using the modular pcie-qcom patch for months now, unloading > > > and reloading the driver repeatedly to test power sequencing, without > > > noticing any problems whatsoever. > > > > Pali's commit log suggests that unloading the module is not, by > > itself, enough to trigger the problem: > > > > https://lore.kernel.org/linux-pci/20220709161858.15031-1-pali@xxxxxxxxxx/ > > > > Can you test the scenario he mentions? > > Turns out the pcie-qcom driver does not support legacy interrupts so > there's no risk of there being any lingering mappings if I understand > things correctly. It still does MSIs, thanks to dw_pcie_host_init(). If you can remove the driver while devices are up and running with MSIs allocated, things may get ugly if things align the wrong way (if a driver still has a reference to an irq_desc or irq_data, for example). M. -- Without deviation from the norm, progress is not possible.