On Friday, May 13, 2016 01:15:31 PM Lukas Wunner wrote: > The PM control register is not accessible in D3cold so we shouldn't try > writing to it in pci_raw_set_power_state() and return an error instead. > > The current behaviour is fatal for devices which are not power-managed > by the platform, yet can be runtime suspended to D3cold with some other > mechanism by the driver: > > - When the device is runtime resumed, pci_pm_runtime_resume() first > calls pci_restore_standard_config() which calls pci_set_power_state() > which calls pci_raw_set_power_state() to put the device into D0. > This fails since the device is still in D3cold. It will be powered up > later on when pci_pm_runtime_resume() calls the driver's > ->runtime_resume callback. > > - pci_raw_set_power_state() prints a message that the device refused to > change power state and returns 0. Further up in the call stack, > pci_restore_standard_config() calls pci_restore_state(), which fails > since the device is in D3cold, but nevertheless invalidates the > saved_state. > > - When pci_pm_runtime_resume() finally calls the driver ->runtime_resume > callback to power up the device, the saved_state is gone. > > Return an error from pci_raw_set_power_state() to avoid this. > > An example for devices affected by this are Thunderbolt controllers > built into Macs which can be put into D3cold with nonstandard ACPI > methods. > > Unfortunately we rely on pci_raw_set_power_state()'s current behaviour > in several places: When platform_pci_set_power_state() is called to wake > a device from D3cold, its current_state is not updated even though it is > no longer in D3cold. Instead, pci_raw_set_power_state() is assumed to > clean up the resulting incongruence. Fix by setting current_state to > PCI_UNKNOWN after platform_pci_set_power_state(). > > Also, when a bridge is put into D3cold, its children's current_state is > changed to D3cold in __pci_complete_power_transition(). (Introduced by > commit 448bd857d48e ("PCI/PM: add PCIe runtime D3cold support").) This > doesn't necessarily reflect the children's actual power state: They may > still be powered on, they're just no longer accessible. However this > shouldn't be a concern because if the children are accessed, their > parent needs to resume anyway and the PM core takes care of this. > Again, pci_raw_set_power_state() is relied upon to clean up the > current_state when the children are resumed the next time. We cannot > reliably reconstruct the children's current_state when resuming their > parent. We also shouldn't blindly set it to PCI_UNKNOWN since some > children may actually be turned off and D3cold is their correct > current_state. Therefore fix by no longer touching the children's > current_state at all. > > Cc: Huang Ying <ying.huang@xxxxxxxxx> > Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx> > --- > drivers/pci/pci.c | 43 ++++++++++--------------------------------- > 1 file changed, 10 insertions(+), 33 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 95727b4..791dfe7 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -612,6 +612,9 @@ static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state) > if (!dev->pm_cap) > return -EIO; > > + if (dev->current_state == PCI_D3cold) > + return -EIO; > + I can agree with this. > if (state < PCI_D0 || state > PCI_D3hot) > return -EINVAL; > > @@ -728,8 +731,10 @@ void pci_update_current_state(struct pci_dev *dev, pci_power_t state) > */ > void pci_power_up(struct pci_dev *dev) > { > - if (platform_pci_power_manageable(dev)) > + if (platform_pci_power_manageable(dev)) { > platform_pci_set_power_state(dev, PCI_D0); > + dev->current_state = PCI_UNKNOWN; Why is this necessary? > + } > > pci_raw_set_power_state(dev, PCI_D0); > pci_update_current_state(dev, PCI_D0); > @@ -746,8 +751,10 @@ static int pci_platform_power_transition(struct pci_dev *dev, pci_power_t state) > > if (platform_pci_power_manageable(dev)) { > error = platform_pci_set_power_state(dev, state); > - if (!error) > + if (!error) { > + dev->current_state = PCI_UNKNOWN; Again, why is this necessary? > pci_update_current_state(dev, state); > + } > } else > error = -ENODEV; > > @@ -809,30 +816,6 @@ static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state) > } > > /** > - * __pci_dev_set_current_state - Set current state of a PCI device > - * @dev: Device to handle > - * @data: pointer to state to be set > - */ > -static int __pci_dev_set_current_state(struct pci_dev *dev, void *data) > -{ > - pci_power_t state = *(pci_power_t *)data; > - > - dev->current_state = state; > - return 0; > -} > - > -/** > - * __pci_bus_set_current_state - Walk given bus and set current state of devices > - * @bus: Top bus of the subtree to walk. > - * @state: state to be set > - */ > -static void __pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state) > -{ > - if (bus) > - pci_walk_bus(bus, __pci_dev_set_current_state, &state); > -} > - > -/** > * __pci_complete_power_transition - Complete power transition of a PCI device > * @dev: PCI device to handle. > * @state: State to put the device into. > @@ -841,15 +824,9 @@ static void __pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state) > */ > int __pci_complete_power_transition(struct pci_dev *dev, pci_power_t state) > { > - int ret; > - > if (state <= PCI_D0) > return -EINVAL; > - ret = pci_platform_power_transition(dev, state); > - /* Power off the bridge may power off the whole hierarchy */ > - if (!ret && state == PCI_D3cold) > - __pci_bus_set_current_state(dev->subordinate, PCI_D3cold); > - return ret; > + return pci_platform_power_transition(dev, state); > } > EXPORT_SYMBOL_GPL(__pci_complete_power_transition); What about if powering off the bridge does remove power from the hierarchy below? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html