On some devices and platforms, the initial platform (e.g. ACPI) power state is not in sync with the power state of the PCI device. This seems like it is, for all intents and purposes, an issue with the device firmware (e.g. ACPI). On some devices, specifically Microsoft Surface Books 2 and 3, we encounter ACPI code akin to the following power resource, corresponding to a PCI device: PowerResource (PRP5, 0x00, 0x0000) { // Initialized to zero, i.e. off. There is no logic for checking // the actual state dynamically. Name (_STA, Zero) Method (_ON, 0, Serialized) { // ... code omitted ... _STA = One } Method (_OFF, 0, Serialized) { // ... code omitted ... _STA = Zero } } This resource is initialized to 'off' and does not have any logic for checking its actual state, i.e. the state of the corresponding PCI device. The stored state of this resource can only be changed by running the (platform/ACPI) power transition functions (i.e. _ON and _OFF). This means that, at boot, the PCI device power state is out of sync with the power state of the corresponding ACPI resource. During initial bring-up of a PCI device, pci_enable_device_flags() updates its PCI core state (from initially 'unknown') by reading from its PCI_PM_CTRL register. It does, however, not check if the platform (here ACPI) state is in sync with/valid for the actual device state and needs updating. The later call to pci_set_power_state(..., PCI_D0), which would normally perform such platform transitions, will evaluate to a no-op if the stored state has been updated to D0 via this. Thus any such power resource, required for D0, will never get "officially" turned on. One could also make the case that this could cause PCI devices requiring some secondary power resources to not work properly, as no attempt is ever made at checking that they are in fact synchronized (i.e. turned on/off as required e.g. by ACPI) for the updated state. Ultimately, this breaks the first D3cold entry for the discrete GPU on the aforementioned Surface Books. On transition of the PCI device to D3cold, the power resource is not properly turned off as it is already considered off: $ echo auto > /sys/bus/pci/devices/0000:02:00.0/power/control [ 104.430304] ACPI: \_SB_.PCI0.RP05: ACPI: PM: GPE69 enabled for wakeup [ 104.430310] pcieport 0000:00:1c.4: Wakeup enabled by ACPI [ 104.444144] ACPI: \_SB_.PCI0.RP05: ACPI: PM: Power state change: D3cold -> D3cold [ 104.444151] acpi device:3b: Device already in D3cold [ 104.444154] pcieport 0000:00:1c.4: power state changed by ACPI to D3cold However, the device still consumes a non-negligible amount of power and gets warm. A full power-cycle fixes this and results in the device being properly transitioned to D3cold: $ echo on > /sys/bus/pci/devices/0000:02:00.0/power/control [ 134.258039] ACPI: \_SB_.PCI0.RP05: ACPI: PM: Power state change: D3cold -> D0 [ 134.369140] ACPI: PM: Power resource [PRP5] turned on [ 134.369157] acpi device:3b: Power state changed to D0 [ 134.369165] pcieport 0000:00:1c.4: power state changed by ACPI to D0 [ 134.369338] pcieport 0000:00:1c.4: Wakeup disabled by ACPI $ echo auto > /sys/bus/pci/devices/0000:02:00.0/power/control [ 310.338597] ACPI: \_SB_.PCI0.RP05: ACPI: PM: GPE69 enabled for wakeup [ 310.338604] pcieport 0000:00:1c.4: Wakeup enabled by ACPI [ 310.353841] ACPI: \_SB_.PCI0.RP05: ACPI: PM: Power state change: D0 -> D3cold [ 310.403879] ACPI: PM: Power resource [PRP5] turned off [ 310.403894] acpi device:3b: Power state changed to D3cold [ 310.403901] pcieport 0000:00:1c.4: power state changed by ACPI to D3cold By replacing pci_set_power_state(..., PCI_DO) with pci_power_up() in do_pci_enable_device(), we can ensure that the state of power resources is always checked. This essentially drops the initial (first) check on the current state of the PCI device and calls platform specific code, i.e. pci_platform_power_transition() to perform the appropriate platform transition. This can be verified by $ echo auto > /sys/bus/pci/devices/0000:02:00.0/power/control [ 152.790448] ACPI: \_SB_.PCI0.RP05: ACPI: PM: GPE69 enabled for wakeup [ 152.790454] pcieport 0000:00:1c.4: Wakeup enabled by ACPI [ 152.804252] ACPI: \_SB_.PCI0.RP05: ACPI: PM: Power state change: D0 -> D3cold [ 152.857548] ACPI: PM: Power resource [PRP5] turned off [ 152.857563] acpi device:3b: Power state changed to D3cold [ 152.857570] pcieport 0000:00:1c.4: power state changed by ACPI to D3cold which yields the expected behavior. Note that a) pci_set_power_state() would call pci_power_up() if the check on the current state failed. This means that in case of the updated state not being D0, there is no functional change. b) The core and platform transitions, i.e. pci_raw_set_power_state() and pci_platform_power_transition() (should) both have checks on the current state. So in case either state is up to date, the corresponding transition should still evaluate to a no-op. The only notable difference made by the suggested replacement is pci_resume_bus() being called for devices where runtime D3cold is enabled. Signed-off-by: Maximilian Luz <luzmaximilian@xxxxxxxxx> --- Changes in v2: - No code changes - Improve commit message, add details - Add Rafael and linux-pm to CCs --- drivers/pci/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 16a17215f633..cc42315210e4 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1800,7 +1800,7 @@ static int do_pci_enable_device(struct pci_dev *dev, int bars) u16 cmd; u8 pin; - err = pci_set_power_state(dev, PCI_D0); + err = pci_power_up(dev); if (err < 0 && err != -EIO) return err; -- 2.30.2