On Tue, Jun 18, 2019 at 6:19 PM Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> wrote: > Actually, to start with, you can say that the ACPI power state returned by acpi_device_get_power() may depend on the configuration of ACPI power resources in the system which may change at any time after acpi_device_get_power() has returned, unless the reference counters of the ACPI power resources in question are set to prevent that from happening. Thus it is invalid to use acpi_device_get_power() in acpi_pci_get_power_state() the way it is done now and the value of the power.state field in the corresponding struct acpi_device object (which reflects the ACPI power resources reference counting, among other things) should be used instead. Then you can describe the particular issue below as an example. IMO that would explain the rationale better here. > Intel Ice Lake has an integrated Thunderbolt controller which means that > the PCIe topology is extended directly from the two root ports (RP0 and > RP1). Power management is handled by ACPI power resources that are > shared between the root ports, Thunderbolt controller (NHI) and xHCI > controller. > > The topology with the power resources (marked with []) looks like: > > Host bridge > | > +- RP0 ---\ > +- RP1 ---|--+--> [TBT] > +- NHI --/ | > | | > | v > +- xHCI --> [D3C] > > Here TBT and D3C are the shared ACPI power resources. ACPI _PR3() method > returns either TBT or D3C or both. > > Say we runtime suspend first the root ports RP0 and RP1, then NHI. Now > since the TBT power resource is still on when the root ports are runtime > suspended their dev->current_state is set to D3hot. When NHI is runtime > suspended TBT is finally turned off but state of the root ports remain > to be D3hot. > > If the user now runs lspci for instance, the result is all 1's like in > the below output (07.0 is the first root port, RP0): > > 00:07.0 PCI bridge: Intel Corporation Device 8a1d (rev ff) (prog-if ff) > !!! Unknown header type 7f > Kernel driver in use: pcieport > > I short the hardware state is not in sync with the software state > anymore. The exact same thing happens with the PME polling thread which > ends up bringing the root ports back into D0 after they are runtime > suspended. > > ACPI core already sets the device state to be D3cold when it drops its > references to the power resources returned by _PR3 even if these power > resources are still physically on (other devices still reference them). > However, in PCI core we call acpi_device_get_power() to figure out the > power state and that returns the "real" power state based on the state > of its power resources. > > To make it work with the shared power resources modify > acpi_pci_get_power_state() so that it reads the ACPI device power state > that was cached by the ACPI core. This makes the PCI device power state > match the ACPI device power state regardless of state of the shared > power resources that may still be on at this point. > > Signed-off-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> > --- > drivers/pci/pci-acpi.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c > index 1897847ceb0c..b782acac26c5 100644 > --- a/drivers/pci/pci-acpi.c > +++ b/drivers/pci/pci-acpi.c > @@ -685,7 +685,8 @@ static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev) > if (!adev || !acpi_device_power_manageable(adev)) > return PCI_UNKNOWN; > > - if (acpi_device_get_power(adev, &state) || state == ACPI_STATE_UNKNOWN) > + state = adev->power.state; > + if (state == ACPI_STATE_UNKNOWN) > return PCI_UNKNOWN; > > return state_conv[state]; > -- > 2.20.1 >