On Wed, Oct 04, 2023 at 05:23:24PM -0500, Bjorn Helgaas wrote: > On Mon, Oct 02, 2023 at 10:00:44AM +0300, Mika Westerberg wrote: > > Commit a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability > > for suspend/resume"") reverted saving and restoring of ASPM L1 Substates > > due to a regression that caused resume from suspend to fail on certain > > systems. However, we never added this capability back and this is now > > causing systems fail to enter low power CPU states, drawing more power > > from the battery. > > AFAICT, the save (suspend) side is effectively the same in > 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for > suspend/resume") (the change reverted by a7152be79b62) and in this > patch. There are minor ordering differences with respect to DPC and > AER, but I don't think they're relevant. > > > The original revert mentioned that we restore L1 PM substate configuration > > even though ASPM L1 may already be enabled. This is due the fact that > > the pci_restore_aspm_l1ss_state() was called before pci_restore_pcie_state(). > > > > Try to enable this functionality again following PCIe r6.0.1, sec 5.5.4 > > more closely by: > > > > 1) Do not restore ASPM configuration in pci_restore_pcie_state() but > > do that after PCIe capability is restored in pci_restore_aspm_state() > > following PCIe r6.0, sec 5.5.4. > > > > 2) ASPM is first enabled on the upstream component and then downstream > > (this is already forced by the parent-child ordering of Linux > > Device Power Management framework). > > > > 3) Program ASPM L1 PM substate configuration before L1 enables. > > > > 4) Program ASPM L1 PM substate enables last after rest of the fields > > in the capability are programmed. > > This patch changes the restore (resume) side. 4ff116d0d5fd restored > L1SS state followed by LNKCTL. > > This patch instead restores LNKCTL (with ASPM *disabled*) before the > L1SS state, and then restores the full LNKCTL (including ASPM config). Right. > > 5) Add denylist that skips restoring on the ASUS and TUXEDO systems > > where these regressions happened, just in case. For the TUXEDO case > > we only skip restore if the BIOS is involved in system suspend > > (that's forcing "mem_sleep=deep" in the command line). This is to > > avoid possible power regression when the default suspend to idle is > > used, and at the same time make sure the devices continue working > > after resume when the BIOS is involved. > > I looked through the v1, v2, and v3 threads, and I see testing failure > reports from Thomas (TUXEDO) for v1 and "v1.5" [1]. v1.5 looks > functionally identical to this v4 except it lacks the TUXEDO denylist > entry. > > I don't see any actual success reports for either ASUS or TUXEDO. > > Do we have testing reports for these ASUS and TUXEDO systems showing > that we need this denylist? I think this patch fixes a real problem > with L1SS save/restore, and unless proven otherwise, I would assume > that it fixes ASUS and TUXEDO as well. The thing with TUXEDO is that on Thomas's system with mem_sleep=deep this patch (without denylist) breaks the resume as he describes here: https://bugzilla.kernel.org/show_bug.cgi?id=216877 However, on exact same TUXEDO system with the same firmwares Werner is not able to reproduce the issue with or without the patch. So I'm not sure what to do and that's why I added denylist that should take effect on Thomas' system when mem_sleep=deep is set but also work on the same system without it. And since we have the denylist, I decided to add the ASUS there to avoid accidentally breaking those too. Let me know if you have any preference what we should do with the TUXEDO one. I really don't know any better way to keep everyone happy. > [1] https://lore.kernel.org/linux-pci/20230630104154.GS14638@xxxxxxxxxxxxxxxxxx/ > > > Reported-by: Koba Ko <koba.ko@xxxxxxxxxxxxx> > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217321 > > IIUC, Koba's report is on a Dell Inspiron 15 3530. Right. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782 > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877 > > These are the original issues reported with 4ff116d0d5fd: 216782 is > the ASUS UX305FA problem reported by Tasev, and 216877 is the TUXEDO > problem reported by Thomas. > > So ... I think this patch definitely fixes a problem. 4ff116d0d5fd > restored L1SS state before LNKCTL on the assumption that ASPM was > disabled at the point, but I don't think we really know that. > > This patch explicitly disables ASPM before restoring L1SS, which seems > safer. > > But we just punt on the ASUS and TUXEDO systems, when there's no > reason we shouldn't be able to restore ASPM config there as well. And > unless I missed them, we don't actually have testing reports from > ASUS, TUXEDO, or Koba's Dell. Just for the record, the Dell and probably others to the real issue is that if we don't restore these the CPU can't enter lower power C-states and that makes suspend to empty the battery much faster than user would expect. So some solution to this is definitely needed sooner rather than later. @Koba, can you provide your tested-by if this solves the issue on the Dell system? > I think there's still something we're missing. > > We restore the LTR config before restoring DEVCTL2 (including the LTR > enable bit) and L1SS state. I don't think we know the state of ASPM > and L1SS at that point, do we? Do you think there could be an issue > there, too? AFAICT LTR does not affect until it is explicitly enabled and I don't think many drivers actually program it (although we have some sort of API for it at least for Intel LPSS devices). If you have suggestions, I'm all open. If I understand you would like this to be done like: - Drop the denylist - Add back the suspend/restore of L1SS - Ask everyone in this thread to try it out I can do that no problem but I guess that for the TUXEDO one (Thomas') it probably is going to fail still.