On Wed, Jun 02, 2021 at 08:02:38PM +0200, Rafael J. Wysocki wrote: > On Tuesday, June 1, 2021 9:46:20 AM CEST Mika Westerberg wrote: > > Hi Rafael, > > > > On Mon, May 31, 2021 at 07:37:45PM +0200, Rafael J. Wysocki wrote: > > > On Monday, May 31, 2021 3:34:35 PM CEST Mika Westerberg wrote: > > > > Some PCIe devices only support PME (Power Management Event) from D3cold. > > > > One example is ASMedia xHCI controller: > > > > > > > > 11:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller (prog-if 30 [XHCI]) > > > > ... > > > > Capabilities: [78] Power Management version 3 > > > > Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+) > > > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > > > > > > > > With such devices, if it has wake enabled, the kernel selects lowest > > > > possible power state to be D0 in pci_target_state(). This is problematic > > > > because it prevents the root port it is connected to enter low power > > > > state too which makes the system consume more energy than necessary. > > > > > > But this is not the only case affected by the patch AFAICS. > > > > Right. > > > > > > The problem in pci_target_state() is that it only accounts the "current" > > > > device state, so when the bridge above it (a root port for instance) is > > > > transitioned into D3hot the device transitions into D3cold. > > > > > > Well, as designed, pci_target_state() is about states the the device can > > > be programmed into, which cannot be D3cold if the device has not platform PM. > > > > Fair enough but the device itself does not need to have platform PM. It > > is enough that the root port far up in the hierarchy has it. > > > > > > This is because when the root port is first transitioned into D3hot then the > > > > ACPI power resource is turned off which puts the PCIe link to L2/L3 (and > > > > the root port and the device are in D3cold). If the root port is kept in > > > > D3hot it still means that the device below it is still effectively in > > > > D3cold as no configuration messages pass through. Furthermore the > > > > implementation note of PCIe 5.0 sec 5.3.1.4 says that the device should > > > > expect to be transitioned into D3cold soon after its link transitions > > > > into L2/L3 Ready state. > > > > > > That's true, but the prerequisite is to put the endpoint device into D3hot > > > and not to attempt to put it into D3cold. > > > > Okay. > > > > > > Taking the above into consideration, instead of forcing the device stay > > > > in D0 we look at the upstream bridge and whether it is allowed to enter > > > > D3 (hot/cold). If this is the case we conclude that the actual target > > > > state of the device is D3cold. This also follows the logic in > > > > pci_set_power_state() that sets power state of the subordinate devices > > > > to D3cold after the bridge itself is transitioned into D3cold. > > > > > > IMO what needs to be fixed is what happens when the "wakeup" argument is "true" > > > and that simply needs to special-case D3hot. > > > > > > Namely, if wakeup from D3hot is not supported, D3hot should still be returned > > > if wakeup from D3cold is supported and the upstream bridge supports D3cold. > > > > This sounds like a good solution except that we probably need to look > > further up then to see if any of the bridges above support D3cold, > > right? For instance if the device is part of TBT topology there may be > > multiple bridges (PCIe upstream ports, downstream ports) between it and > > the root port that has the platform PM "support". > > Moreover, pci_enable_wake() would need to be modified to also enable PME if the > target state is D3hot and only wakeup from D3cold is supported. > > But if pci_enable_wake() is modified this way, we don't need to worry about the > upstream bridges. Worst-case wakeup will not work (even though enabled) if the > device stays in D3hot. > > So pci_target_state() should return D3hot even if wakeup from D3hot itself is > not supported, but wakeup from D3cold is supported, regardless of what the > upstream bridges can do (with the assumption that power will get removed from > the device via an upstream bridge) and pci_enable_wake() should be modified to > enable PME if the target state is D3hot, but the device can only signal wakeup > from D3cold. OK thanks. Let me try this in v3. > IMO, allowing PME support from D3cold only without providing any means to put > the device into D3cold is not a valid configuration and it need not be taken > into account here. AFAIK you don't need the device to have any other means than that it responds to the L2/3 handshake properly according to the spec (so that the link can be put to L2). I think this is perfectly valid configuration :)