Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 19, 2021 at 9:48 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Wed, May 19, 2021 at 09:12:26PM +0200, Rafael J. Wysocki wrote:
> > On Wed, May 19, 2021 at 7:28 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > On Fri, May 07, 2021 at 05:08:42PM +0300, Konstantin Kharlamov wrote:
> > > > On Fri, 2021-05-07 at 08:30 -0500, Bjorn Helgaas wrote:
> > > > > On Fri, May 07, 2021 at 12:07:38AM +0200, Lukas Wunner wrote:
> > > > > > On Thu, May 06, 2021 at 04:48:42PM -0500, Bjorn Helgaas wrote:
> > > > > > > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > > > > > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > > > > > > longer being detected, and dmesg having errors like:
> > > > > > > >
> > > > > > > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0
> > > > > > > > (config space inaccessible)
> > > > > > > >
> > > > > > > > and a stacktrace. The reason turned out that the hw that the quirk
> > > > > > > > powers off does not get powered on back on resume.
> > >
> > > > > For example, "With s2idle, the machine isn't suspended via ACPI, so
> > > > > the AML code which powers the controller off isn't executed."  AFAICT
> > > > > that isn't actually a required, documented property of s2idle, but
> > > > > rather it reaches into the internal implementation.
> > >
> > > > > The code comment "If suspend mode is s2idle, power won't get restored
> > > > > on resume" is similar.  !pm_suspend_via_firmware() tells us that
> > > > > platform firmware won't be invoked.  But the connection between *that*
> > > > > and "power won't get restored" is unexplained.
> > > >
> > > > Sorry, I can't comment anything regarding AML and power management
> > > > in general since I am really new to all of this. However, regarding
> > > > the usage of the `pm_suspend_via_firmware()`: yeah, I also think it
> > > > is unclear what this does, and I was thinking about adding a wrapper
> > > > function something like `is_s2idle()` to the suspend.h, which would
> > > > simply call `pm_suspend_via_firmware` internally.
> > >
> > > No, that's not my point at all.  I don't really care whether the
> > > interface is called pm_suspend_via_firmware() or is_s2idle().
> > >
> > > What I don't like about this is that it's all just unexplained magic,
> > > as was the original quirk.
> > >
> > > IIUC, the quirk is applied by pci_pm_suspend_noirq() *after* it puts
> > > the device in a low-power state.  Here's my uninformed speculation
> > > about what happens:
> > >
> > >   - On suspend, pci_pm_suspend_noirq() puts device in low-power state.
> >
> > Right.
> >
> > >     My *guess* is this means D0 or D3hot for s2idle, and D3cold for
> > >     everything else.
> >
> > This really depends on the ACPI methods involved, but as a rule (and
> > I'm not aware of any exceptions) the target power state of the device
> > for s2idle and "everything else" is the same.  If D3cold is possible,
> > it will be D3cold.  If not, that will be D3hot etc.
> >
> > The only difference (from the individual device perspective) is that
> > in the ACPI S3 (or S4 if this matters here) case the platform firmware
> > (and that's not AML but something like SMM) may cut power off from it
> > later.
> >
> > >  [Do we have sufficient debug to find out what these states are?]
> >
> > I'm not sure what you mean.  What states are supported or something else?
>
> I was curious about what states we actually put this device in for
> s2idle/standby/s2ram/std because I was hoping we could decide based
> on what state s2idle used.  But if they all use the same target state,
> that idea won't work.
>
> > >   - pci_pm_suspend_noirq() does pci_fixup_suspend_late fixups,
> > >     including quirk_apple_poweroff_thunderbolt().
> > >
> > >   - quirk_apple_poweroff_thunderbolt() runs the magic SXIO/SXFP/SXLF
> > >     methods, which apparently turn off more power.
> > >
> > >   - On resume, pci_pm_resume_noirq() brings the device back to D0.
> > >
> > >     If we're resuming from standby, S2RAM, or STD, I speculate the
> > >     device is in D3cold, so this involves running AML that seems to
> > >     undo whatever SXIO/SXFP/SXLF did.
> >
> > The standby, S2RAM and STD cases are actually different in that respect.
> >
> > In the STD case we get the device from the restore kernel and it most
> > likely has been enumerated by it, so it is in whatever power state the
> > restore kernel puts it into.
> >
> > For standby that most likely is the state in which the device was left
> > during suspend.
> >
> > For S2RAM, it most likely will be something like D0-uninitialized,
> > because the platform firmware initiating the resume transition (which
> > is not AML, but again something like SMM) restores power to the
> > platform (including the majority of devices), but doesn't initialize
> > them as a rule.
> >
> > The STD and S2RAM resume will undo the "magic", standby resume may not.
> >
> > >     If we're resuming from s2idle, I speculate the device is in D0 or
> > >     D3hot, and we run different AML (or maybe no AML at all), and we
> >
> > It is in whatever power state it was left in during suspend and we run
> > the same AML as in the other cases, modulo the possible power state
> > difference (with respect to the other cases).
> >
> > >     *don't* undo the effects of SXIO/SXFP/SXLF, so the device doesn't
> > >     work.
> >
> > This is correct.
> >
> > > If the above is anything like what's happening, we should be able to
> > > skip SXIO/SXFP/SXLF based on the current power state of the device.
> >
> > Not really.
> >
> > > E.g., if the device is in D0 or D3hot, we should not use
> > > SXIO/SXFP/SXLF to yank power.
> > >
> > > That would seem more connected to the observable state of the device
> > > than using pm_suspend_via_firmware(), which relies on the connection
> > > between s2idle and PM_SUSPEND_FLAG_FW_SUSPEND (which is not at all
> > > obvious) and the power state of the device while in s2idle (also not
> > > obvious).
> >
> > The problem is related to the fact that in s2idle the platform
> > firmware does not finalize the suspend transition and, consequently,
> > it doesn't initiate the resume transition.  Therefore whatever power
> > state the device was left in during suspend must be dealt with during
> > the subsequent resume.  Hence, if whatever is done by SXIO/SXFP/SXLF
> > in the suspend path cannot be reversed in the resume path by the
> > kernel (because there is no known method to do that), they should not
> > be invoked.  And that's exactly because the platform firmware will not
> > finalize the suspend transition which is indicated by
> > PM_SUSPEND_FLAG_FW_SUSPEND being unset.
>
> How can we connect "if (!pm_suspend_via_firmware())" in this patch
> with the fact that firmware doesn't finalize suspend (and consequently
> does not reverse things in resume)?
>
> I don't see any use of pm_suspend_via_firmware() or
> PM_SUSPEND_FLAG_FW_SUSPEND that looks relevant.

First of all, there is a kerneldoc comment next to
pm_suspend_via_firmware() which is relevant.  Especially the last
paragraph of that comment applies directly to the case at hand IMV.

Next, if you see how pm_suspend_global_flags get populated, you'll
notice that pm_suspend_clear_flags() is called early in both the
system-wide suspend and hibernation paths and then the ACPI platform
callbacks invoke pm_set_suspend_via_firmware() when the platform is
going to finalize the suspend transition.  [In particular, notice the
acpi_state > ACPI_STATE_S1 condition in acpi_suspend_begin(), which is
related to the difference between ACPI S3/S4 and standby that I was
talking about before.]  This means that, on a system with ACPI,
PM_SUSPEND_FLAG_FW_SUSPEND is set only if the target system state is
above S1 and in those cases the firmware finalizes the suspend
transition.

Of course, s2idle callbacks don't invoke pm_set_suspend_via_firmware()
at all, because the target ACPI system state for s2idle is S0 (the
working state from the ACPI perspective).

> That makes it really hard to figure out how this patch works and how to avoid breaking it
> in the future.

Well, if you don't trust kerneldoc comments, you get to figure out
things yourself which may not be straightforward ...

> Thanks for your patience in explaining all this!

YW

Cheers!



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux