On Tue, Jan 23, 2024 at 7:56 PM Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > The commit noted in fixes added a bogus requirement that runtime PM > managed devices need to be in the RPM_ACTIVE state for PME polling. > In fact, only devices in low power states should be polled. > > However there's still a requirement that the device config space must > be accessible, which has implications for both the current state of > the polled device and the parent bridge, when present. It's not > sufficient to assume the bridge remains in D0 and cases have been > observed where the bridge passes the D0 test, but the PM state > indicates RPM_SUSPENDING and config space of the polled device becomes > inaccessible during pci_pme_wakeup(). > > Therefore, since the bridge is already effectively required to be in > the RPM_ACTIVE state, formalize this in the code and elevate the PM > usage count to maintain the state while polling the subordinate > device. > > Cc: Lukas Wunner <lukas@xxxxxxxxx> > Cc: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> > Cc: Rafael J. Wysocki <rafael@xxxxxxxxxx> > Fixes: d3fcd7360338 ("PCI: Fix runtime PM race with PME polling") > Reported-by: Sanath S <sanath.s@xxxxxxx> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218360 > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > --- > drivers/pci/pci.c | 37 ++++++++++++++++++++++--------------- > 1 file changed, 22 insertions(+), 15 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index bdbf8a94b4d0..764d7c977ef4 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -2433,29 +2433,36 @@ static void pci_pme_list_scan(struct work_struct *work) > if (pdev->pme_poll) { > struct pci_dev *bridge = pdev->bus->self; > struct device *dev = &pdev->dev; > - int pm_status; > + struct device *bdev = bridge ? &bridge->dev : NULL; > + int bref = 0; > > /* > - * If bridge is in low power state, the > - * configuration space of subordinate devices > - * may be not accessible > + * If we have a bridge, it should be in an active/D0 > + * state or the configuration space of subordinate > + * devices may not be accessible or stable over the > + * course of the call. > */ > - if (bridge && bridge->current_state != PCI_D0) > - continue; > + if (bdev) { > + bref = pm_runtime_get_if_active(bdev, true); > + if (!bref) I would check bref <= 0 here. > + continue; > + > + if (bridge->current_state != PCI_D0) Isn't the power state guaranteed to be PCI_D0 at this point? If it isn't, then why? > + goto put_bridge; > + } > > /* > - * If the device is in a low power state it > - * should not be polled either. > + * The device itself should be suspended but config > + * space must be accessible, therefore it cannot be in > + * D3cold. > */ > - pm_status = pm_runtime_get_if_active(dev, true); > - if (!pm_status) > - continue; > - > - if (pdev->current_state != PCI_D3cold) > + if (pm_runtime_suspended(dev) && > + pdev->current_state != PCI_D3cold) > pci_pme_wakeup(pdev, NULL); > > - if (pm_status > 0) > - pm_runtime_put(dev); > +put_bridge: > + if (bref > 0) > + pm_runtime_put(bdev); > } else { > list_del(&pme_dev->list); > kfree(pme_dev); > --