On Thu, Jan 02, 2025 at 10:20:35PM +0100, Lukas Wunner wrote: > The interrupt handler for bandwidth notifications, pcie_bwnotif_irq(), > dereferences a "data" pointer. > > On unbind, that pointer is set to NULL by pcie_bwnotif_remove(). However > the interrupt handler may still be invoked afterwards and will dereference > that NULL pointer. > > That's because the interrupt is requested using a devm_*() helper and the > driver core releases devm_*() resources *after* calling ->remove(). > > pcie_bwnotif_remove() does clear the Link Bandwidth Management Interrupt > Enable and Link Autonomous Bandwidth Interrupt Enable bits in the Link > Control Register, but that won't prevent execution of pcie_bwnotif_irq(): > The interrupt for bandwidth notifications may be shared with AER, DPC, > PME, and hotplug. So pcie_bwnotif_irq() may be executed as long as the > interrupt is requested. > > There's a similar race on bind: pcie_bwnotif_probe() requests the > interrupt when the "data" pointer still points to NULL. A NULL pointer > deref may thus likewise occur if AER, DPC, PME or hotplug raise an > interrupt in-between the bandwidth controller's call to devm_request_irq() > and assignment of the "data" pointer. > > Drop the devm_*() usage and reorder requesting of the interrupt to fix the > issue. > > While at it, drop a stray but harmless no_free_ptr() invocation when > assigning the "data" pointer in pcie_bwnotif_probe(). > > Evert reports a hang on shutdown of an ASUS ROG Strix SCAR 17 G733PYV. > The issue is no longer reproducible with the present commit. > > Evert found that attaching a USB-C monitor prevented the hang. The > machine contains an ASMedia USB 3.2 controller below a hotplug-capable > Root Port. So one possible explanation is that the controller gets > hot-removed on shutdown unless something is connected. And the ensuing > hotplug interrupt occurs exactly when the bandwidth controller is > unregistering. The precise cause could not be determined because the > screen had already turned black when the hang occurred. > > Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") > Reported-by: Evert Vorster <evorster@xxxxxxxxx> > Tested-by: Evert Vorster <evorster@xxxxxxxxx> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219629 > Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx> Applied to pci/for-linus for v6.13, thanks, Evert and Lukas for working to hard to get this resolved! > --- > drivers/pci/pcie/bwctrl.c | 25 +++++++++++++++++-------- > 1 file changed, 17 insertions(+), 8 deletions(-) > > diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c > index b59cacc..7cd8211 100644 > --- a/drivers/pci/pcie/bwctrl.c > +++ b/drivers/pci/pcie/bwctrl.c > @@ -303,17 +303,18 @@ static int pcie_bwnotif_probe(struct pcie_device *srv) > if (ret) > return ret; > > - ret = devm_request_irq(&srv->device, srv->irq, pcie_bwnotif_irq, > - IRQF_SHARED, "PCIe bwctrl", srv); > + scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) > + scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) > + port->link_bwctrl = data; > + > + ret = request_irq(srv->irq, pcie_bwnotif_irq, IRQF_SHARED, > + "PCIe bwctrl", srv); > if (ret) > - return ret; > + goto err_reset_data; > > - scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) { > - scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) { > - port->link_bwctrl = no_free_ptr(data); > + scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) > + scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) > pcie_bwnotif_enable(srv); > - } > - } > > pci_dbg(port, "enabled with IRQ %d\n", srv->irq); > > @@ -323,6 +324,12 @@ static int pcie_bwnotif_probe(struct pcie_device *srv) > port->link_bwctrl->cdev = NULL; > > return 0; > + > +err_reset_data: > + scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) > + scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) > + port->link_bwctrl = NULL; > + return ret; > } > > static void pcie_bwnotif_remove(struct pcie_device *srv) > @@ -333,6 +340,8 @@ static void pcie_bwnotif_remove(struct pcie_device *srv) > > pcie_bwnotif_disable(srv->port); > > + free_irq(srv->irq, srv); > + > scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) > scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) > srv->port->link_bwctrl = NULL; > -- > 2.43.0 >