On Fri, Aug 25, 2023 at 11:57:00AM +0800, Feiyang Chen wrote: > On Fri, Aug 25, 2023 at 5:59 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Thu, Aug 24, 2023 at 09:37:38AM +0800, Feiyang Chen wrote: > > > When the current state is already PCI_D0, pci_power_up() will return > > > 0 even though dev->pm_cap is not set. In that case, we should not > > > read the PCI_PM_CTRL register in pci_set_full_power_state(). > > > > > > There is nothing more needs to be done below in that case. > > > Additionally, pci_power_up() has two callers only and the other one > > > ignores the return value, so we can safely move the current state > > > check from pci_power_up() to pci_set_full_power_state(). > > > > Does this fix a bug? I guess it does, because previously > > pci_set_full_power_state() did a config read at 0 + PCI_PM_CTRL, i.e., > > offset 4, which is actually PCI_COMMAND, and set dev->current_state > > based on that. So dev->current_state is now junk, right? > > Yes. > > > This might account for some "Refused to change power state from %s to D0" > > messages. > > > > How did you find this? It's nice if we can mention a symptom so > > people can connect the problem with this fix. > > We are attempting to add MSI support for our stmmac driver, but the > pci_alloc_irq_vectors() function always fails. > After looking into it more, we came across the message "Refused to > change power state from D3hot to D0" :) So I guess this device doesn't have a PM Capability at all? Can you collect the "sudo lspci -vv" output? The PM Capability is required for all PCIe devices, so maybe this is a conventional PCI device? > > This sounds like something that probably should have a stable tag? > > Do I need to include the symptom and Cc in the commit message and > then send v4? > > > Fixes: e200904b275c ("PCI/PM: Split pci_power_up()") > > > Signed-off-by: Feiyang Chen <chenfeiyang@xxxxxxxxxxx> > > > Reviewed-by: Rafael J. Wysocki <rafael@xxxxxxxxxx> > > > --- > > > drivers/pci/pci.c | 9 +++++---- > > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index 60230da957e0..7e90ab7b47a1 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -1242,9 +1242,6 @@ int pci_power_up(struct pci_dev *dev) > > > else > > > dev->current_state = state; > > > > > > - if (state == PCI_D0) > > > - return 0; > > > - > > > return -EIO; > > > } > > > > > > @@ -1302,8 +1299,12 @@ static int pci_set_full_power_state(struct pci_dev *dev) > > > int ret; > > > > > > ret = pci_power_up(dev); > > > - if (ret < 0) > > > + if (ret < 0) { > > > + if (dev->current_state == PCI_D0) > > > + return 0; > > > + > > > return ret; > > > + } > > > pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); > > > dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK; One thing that makes me hesitate a little bit is that we rely on the failure return from pci_power_up() to guard the dev->pm_cap usage. That's slightly obscure, and I liked the way the v1 patch made it explicit. And it seems slightly weird that when there's no PM cap, pci_power_up() always returns failure even if the platform was able to put the device in D0. Anyway, here's a proposal for commit log and updated comment for pci_power_up(): commit 5694ba13b004 ("PCI/PM: Only read PCI_PM_CTRL register when available") Author: Feiyang Chen <chenfeiyang@xxxxxxxxxxx> Date: Thu Aug 24 09:37:38 2023 +0800 PCI/PM: Only read PCI_PM_CTRL register when available For a device with no Power Management Capability, pci_power_up() previously returned 0 (success) if the platform was able to put the device in D0, which led to pci_set_full_power_state() trying to read PCI_PM_CTRL, even though it doesn't exist. Since dev->pm_cap == 0 in this case, pci_set_full_power_state() actually read the wrong register, interpreted it as PCI_PM_CTRL, and corrupted dev->current_state. This led to messages like this in some cases: pci 0000:01:00.0: Refused to change power state from D3hot to D0 To prevent this, make pci_power_up() always return a negative failure code if the device lacks a Power Management Capability, even if non-PCI platform power management has been able to put the device in D0. The failure will prevent pci_set_full_power_state() from trying to access PCI_PM_CTRL. Fixes: e200904b275c ("PCI/PM: Split pci_power_up()") Link: https://lore.kernel.org/r/20230824013738.1894965-1-chenfeiyang@xxxxxxxxxxx Signed-off-by: Feiyang Chen <chenfeiyang@xxxxxxxxxxx> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Reviewed-by: "Rafael J. Wysocki" <rafael@xxxxxxxxxx> Cc: stable@xxxxxxxxxxxxxxx # v5.19+ diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 60230da957e0..39728196e295 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1226,6 +1226,10 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout) * * On success, return 0 or 1, depending on whether or not it is necessary to * restore the device's BARs subsequently (1 is returned in that case). + * + * On failure, return a negative error code. Always return failure if @dev + * lacks a Power Management Capability, even if the platform was able to + * put the device in D0 via non-PCI means. */ int pci_power_up(struct pci_dev *dev) { @@ -1242,9 +1246,6 @@ int pci_power_up(struct pci_dev *dev) else dev->current_state = state; - if (state == PCI_D0) - return 0; - return -EIO; } @@ -1302,8 +1303,12 @@ static int pci_set_full_power_state(struct pci_dev *dev) int ret; ret = pci_power_up(dev); - if (ret < 0) + if (ret < 0) { + if (dev->current_state == PCI_D0) + return 0; + return ret; + } pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); dev->current_state = pmcsr & PCI_PM_CTRL_STATE_MASK;