On Tue, Jun 25, 2019 at 12:30 PM Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> wrote: > > The ACPI power state returned by acpi_device_get_power() may depend on > the configuration of ACPI power resources in the system which may change > any time after acpi_device_get_power() has returned, unless the > reference counters of the ACPI power resources in question are set to > prevent that from happening. Thus it is invalid to use acpi_device_get_power() > in acpi_pci_get_power_state() the way it is done now and the value of > the ->power.state field in the corresponding struct acpi_device objects > (which reflects the ACPI power resources reference counting, among other > things) should be used instead. > > As an example where this becomes an issue is Intel Ice Lake where the > Thunderbolt controller (NHI), two PCIe root ports (RP0 and RP1) and xHCI > all share the same power resources. The following picture with power > resources marked with [] shows the topology: > > Host bridge > | > +- RP0 ---\ > +- RP1 ---|--+--> [TBT] > +- NHI --/ | > | | > | v > +- xHCI --> [D3C] > > Here TBT and D3C are the shared ACPI power resources. ACPI _PR3() method > of the devices in question returns either TBT or D3C or both. > > Say we runtime suspend first the root ports RP0 and RP1, then NHI. Now > since the TBT power resource is still on when the root ports are runtime > suspended their dev->current_state is set to D3hot. When NHI is runtime > suspended TBT is finally turned off but state of the root ports remain > to be D3hot. Now when the xHCI is runtime suspended D3C gets also turned > off. PCI core thus has power states of these devices cached in their > dev->current_state as follows: > > RP0 -> D3hot > RP1 -> D3hot > NHI -> D3cold > xHCI -> D3cold > > If the user now runs lspci for instance, the result is all 1's like in > the below output (00:07.0 is the first root port, RP0): > > 00:07.0 PCI bridge: Intel Corporation Device 8a1d (rev ff) (prog-if ff) > !!! Unknown header type 7f > Kernel driver in use: pcieport > > In short the hardware state is not in sync with the software state > anymore. The exact same thing happens with the PME polling thread which > ends up bringing the root ports back into D0 after they are runtime > suspended. > > For this reason, modify acpi_pci_get_power_state() so that it uses the > ACPI device power state that was cached by the ACPI core. This makes the > PCI device power state match the ACPI device power state regardless of > state of the shared power resources which may still be on at this point. > > Link: https://lore.kernel.org/r/20190618161858.77834-2-mika.westerberg@xxxxxxxxxxxxxxx > Signed-off-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> > --- > drivers/pci/pci-acpi.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c > index 1897847ceb0c..b782acac26c5 100644 > --- a/drivers/pci/pci-acpi.c > +++ b/drivers/pci/pci-acpi.c > @@ -685,7 +685,8 @@ static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev) > if (!adev || !acpi_device_power_manageable(adev)) > return PCI_UNKNOWN; > > - if (acpi_device_get_power(adev, &state) || state == ACPI_STATE_UNKNOWN) > + state = adev->power.state; > + if (state == ACPI_STATE_UNKNOWN) > return PCI_UNKNOWN; > > return state_conv[state]; > -- Not that there are two additional issues related to the one fixed by this patch that need to be addressed differently. For details, see https://patchwork.kernel.org/patch/11015379/ https://patchwork.kernel.org/patch/11015391/