On Mon, 6 Jan 2025, Lukas Wunner wrote: > The interrupt handler for bandwidth notifications, pcie_bwnotif_irq(), > dereferences a "data" pointer. > > On unbind, that pointer is set to NULL by pcie_bwnotif_remove(). However > the interrupt handler may still be invoked afterwards and will dereference > that NULL pointer. > > That's because the interrupt is requested using a devm_*() helper and the > driver core releases devm_*() resources *after* calling ->remove(). > > pcie_bwnotif_remove() does clear the Link Bandwidth Management Interrupt > Enable and Link Autonomous Bandwidth Interrupt Enable bits in the Link > Control Register, but that won't prevent execution of pcie_bwnotif_irq(): > The interrupt for bandwidth notifications may be shared with AER, DPC, > PME, and hotplug. So pcie_bwnotif_irq() may be executed as long as the > interrupt is requested. > > There's a similar race on bind: pcie_bwnotif_probe() requests the > interrupt when the "data" pointer still points to NULL. A NULL pointer > deref may thus likewise occur if AER, DPC, PME or hotplug raise an > interrupt in-between the bandwidth controller's call to devm_request_irq() > and assignment of the "data" pointer. > > Drop the devm_*() usage and reorder requesting of the interrupt to fix the > issue. > > While at it, drop a stray but harmless no_free_ptr() invocation when > assigning the "data" pointer in pcie_bwnotif_probe(). > > Ilpo points out that the locking on unbind and bind needs to be symmetric, > so move the call to pcie_bwnotif_disable() inside the critical section > protected by pcie_bwctrl_setspeed_rwsem and pcie_bwctrl_lbms_rwsem. > > Evert reports a hang on shutdown of an ASUS ROG Strix SCAR 17 G733PYV. > The issue is no longer reproducible with the present commit. > > Evert found that attaching a USB-C monitor prevented the hang. The > machine contains an ASMedia USB 3.2 controller below a hotplug-capable > Root Port. So one possible explanation is that the controller gets > hot-removed on shutdown unless something is connected. And the ensuing > hotplug interrupt occurs exactly when the bandwidth controller is > unregistering. The precise cause could not be determined because the > screen had already turned black when the hang occurred. > > Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") > Reported-by: Evert Vorster <evorster@xxxxxxxxx> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219629 > Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx> > Tested-by: Evert Vorster <evorster@xxxxxxxxx> > --- > Changes v1 -> v2 (in response to Ilpo's review): > Move request_irq() inside critical section on bind. > Move free_irq() + pcie_bwnotif_disable() inside critical section on unbind. > Amend commit message with a paragraph explaining these changes. > > drivers/pci/pcie/bwctrl.c | 25 ++++++++++++++++--------- > 1 file changed, 16 insertions(+), 9 deletions(-) > > diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c > index b59cacc740fa..0a5e7efbce2c 100644 > --- a/drivers/pci/pcie/bwctrl.c > +++ b/drivers/pci/pcie/bwctrl.c > @@ -303,14 +303,17 @@ static int pcie_bwnotif_probe(struct pcie_device *srv) > if (ret) > return ret; > > - ret = devm_request_irq(&srv->device, srv->irq, pcie_bwnotif_irq, > - IRQF_SHARED, "PCIe bwctrl", srv); > - if (ret) > - return ret; > - > scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) { > scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) { > - port->link_bwctrl = no_free_ptr(data); > + port->link_bwctrl = data; > + > + ret = request_irq(srv->irq, pcie_bwnotif_irq, > + IRQF_SHARED, "PCIe bwctrl", srv); > + if (ret) { > + port->link_bwctrl = NULL; > + return ret; > + } > + > pcie_bwnotif_enable(srv); > } > } > @@ -331,11 +334,15 @@ static void pcie_bwnotif_remove(struct pcie_device *srv) > > pcie_cooling_device_unregister(data->cdev); > > - pcie_bwnotif_disable(srv->port); > + scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) { > + scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) { > + pcie_bwnotif_disable(srv->port); > + > + free_irq(srv->irq, srv); > > - scoped_guard(rwsem_write, &pcie_bwctrl_setspeed_rwsem) > - scoped_guard(rwsem_write, &pcie_bwctrl_lbms_rwsem) > srv->port->link_bwctrl = NULL; > + } > + } > } > > static int pcie_bwnotif_suspend(struct pcie_device *srv) Thanks for the update, Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> I suppose Niklas has not had time to test if this solved the other issue? -- i.