On Thu, 3 Aug 2023 11:12:33 -0600 Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > Testing that a device is not currently in a low power state provides no > guarantees that the device is not immenently transitioning to such a state. > We need to increment the PM usage counter before accessing the device. > Since we don't wish to wake the device for PME polling, do so only if the > device is already active by using pm_runtime_get_if_active(). > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > --- > drivers/pci/pci.c | 23 ++++++++++++++++------- > 1 file changed, 16 insertions(+), 7 deletions(-) Hey folks, Resurrecting this patch (currently commit d3fcd7360338) for discussion as it's been identified as the source of a regression in: https://bugzilla.kernel.org/show_bug.cgi?id=218360 Copying Mika, Lukas, and Rafael as it's related to: 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold") where we skip devices in D3cold when processing the PME list. I think the issue in the above bz is that the downstream TB3/USB4 port is in D3 (presumably D3hot) and I therefore infer the device is in state RPM_SUSPENDED. This commit is attempting to make sure the device power state is stable across the call such that it does not transition into D3cold while we're accessing it. To do that I used pm_runtime_get_if_active(), but in retrospect this requires the device to be in RPM_ACTIVE so we end up skipping anything suspended or transitioning. As reported in the above bz, I tried replacing this with: pm_runtime_get_noresume(dev); pm_runtime_barrier(dev); The theory here being that the barrier would wait for any transitioning states such that as far as runtime power management is concerned, the device power state is stable. This causes live locks where the barrier never returns. Instead I'm considering that since we're polling the PME list, maybe we could just defer devices in transition states, for instance something that looks like pm_runtime_get_if_active(), but would return zero if the device was in RPM_SUSPENDING or RPM_RESUMING rather than requiring RPM_ACTIVE. I'm not an expert in PME or runtime power management though, so I'm looking for advice. Thanks, Alex > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 60230da957e0..bc266f290b2c 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -2415,10 +2415,13 @@ static void pci_pme_list_scan(struct work_struct *work) > > mutex_lock(&pci_pme_list_mutex); > list_for_each_entry_safe(pme_dev, n, &pci_pme_list, list) { > - if (pme_dev->dev->pme_poll) { > - struct pci_dev *bridge; > + struct pci_dev *pdev = pme_dev->dev; > + > + if (pdev->pme_poll) { > + struct pci_dev *bridge = pdev->bus->self; > + struct device *dev = &pdev->dev; > + int pm_status; > > - bridge = pme_dev->dev->bus->self; > /* > * If bridge is in low power state, the > * configuration space of subordinate devices > @@ -2426,14 +2429,20 @@ static void pci_pme_list_scan(struct work_struct *work) > */ > if (bridge && bridge->current_state != PCI_D0) > continue; > + > /* > - * If the device is in D3cold it should not be > - * polled either. > + * If the device is in a low power state it > + * should not be polled either. > */ > - if (pme_dev->dev->current_state == PCI_D3cold) > + pm_status = pm_runtime_get_if_active(dev, true); > + if (!pm_status) > continue; > > - pci_pme_wakeup(pme_dev->dev, NULL); > + if (pdev->current_state != PCI_D3cold) > + pci_pme_wakeup(pdev, NULL); > + > + if (pm_status > 0) > + pm_runtime_put(dev); > } else { > list_del(&pme_dev->list); > kfree(pme_dev);