Re: [PATCH v4] PCI/ASPM: Add back L1 PM Substate save and restore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 09, 2023 at 11:34:21AM -0500, Bjorn Helgaas wrote:
> On Mon, Oct 09, 2023 at 11:34:34AM +0300, Mika Westerberg wrote:
> > On Thu, Oct 05, 2023 at 12:22:26PM -0500, Bjorn Helgaas wrote:
> > > On Thu, Oct 05, 2023 at 05:57:52PM +0200, Thomas Witt wrote:
> > > > On 05/10/2023 17:30, Bjorn Helgaas wrote:
> > > > ...
> > > 
> > > > > Right, without the denylist, I expect Thomas' TUXEDO to fail,
> > > > > but I still hope we can figure out why.  If we just keep it on
> > > > > the denylist, that system will suffer from more power
> > > > > consumption than necessary, but only after suspend/resume.
> > > > > 
> > > > > A denylist seems like the absolute last resort.  In this case
> > > > > we don't know about anything *wrong* with those platforms; all
> > > > > we know is that our resume path doesn't work.  It's likely
> > > > > that it fails on other platforms we haven't heard about, too.
> > > > 
> > > > The best guess from Mika and David was a firmware issue, but I
> > > > run the same Firmware revision as Werner. I even reflashed the
> > > > Firmware, but that did not change anything:
> > > 
> > > Possibly a BIOS settings difference?
> > > 
> > > > Quoting David Box:
> > > > > I agree that we should pursue an exception for your system.
> > > > > This is looking like a firmware bug. One thing we did notice
> > > > > in the turbostat results is your IRTL (Interrupt Response Time
> > > > > Limit) values are bogus:
> > > > >
> > > > > cpu6: MSR_PKGC3_IRTL: 0x0000884e (valid, 79872 ns)
> > > > > cpu6: MSR_PKGC6_IRTL: 0x00008000 (valid, 0 ns)
> > > > > cpu6: MSR_PKGC7_IRTL: 0x00008000 (valid, 0 ns)
> > > > > cpu6: MSR_PKGC8_IRTL: 0x00008000 (valid, 0 ns)
> > > > > cpu6: MSR_PKGC9_IRTL: 0x00008000 (valid, 0 ns)
> > > > > cpu6: MSR_PKGC10_IRTL: 0x00008000 (valid, 0 ns)
> > > > >
> > > > > This is despite the PKGC configuration register showing that all
> > > > > states are enabled:
> > > > >
> > > > > cpu6: MSR_PKG_CST_CONFIG_CONTROL: 0x1e008008 (UNdemote-C3, UNdemote-C1,
> > > > demote-
> > > > C3, demote-C1, locked, pkg-cstate-limit=8 (unlimited))
> > > > >
> > > > > Firmware sets this.
> > > 
> > > I can't find this discussion, but if there's a firmware issue related
> > > to IRTL MSRs, I would want the workaround in intel-idle.c or whatever
> > > code deals with the MSRs, not in the ASPM code.
> > 
> > Unfortunately that discussion never ended up on the mailing list. But in
> > summary that particular system seems to run pretty hot (if I understand
> > correctly what David concluded). This is the reason Thomas has the
> > pm_sleep=deep in the command line and this is why the L1 SS restore then
> > causes the failure on resume. Without this it works fine but consumes
> > lot of energy in s2idle.
> > 
> > Can you suggest what we should do with this now?
> >
> > We got report from Tasev Nikola on
> > https://bugzilla.kernel.org/show_bug.cgi?id=216782 that even if the Asus
> > system is removed from the denylist it works so that we can do. However,
> > with the Thomas' system I'm not sure. If we leave it like this:
> > 
> > static int aspm_l1ss_suspend_via_firmware(const struct dmi_system_id *not_used)
> > {
> >         return pm_suspend_via_firmware();
> > }
> > 
> > static const struct dmi_system_id aspm_l1ss_denylist[] = {
> >         {
> >                 /*
> >                  * https://bugzilla.kernel.org/show_bug.cgi?id=216877
> >                  *
> >                  * This system needs to use suspend to mem instead of its 
> >                  * default (suspend to idle) to avoid draining the battery. 
> >                  * However, the BIOS gets confused if we try to restore the
> >                  * L1SS registers so avoid doing that if the user forced
> >                  * suspend to mem. The default suspend to idle on the other
> >                  * hand needs restoring L1SS to allow the CPU to enter low
> >                  * power states. This entry should handle both.
> >                  */
> >                 .callback = aspm_l1ss_suspend_via_firmware,
> >                 .ident = "TUXEDO InfinityBook S 14 v5",
> >                 .matches = {
> >                         DMI_MATCH(DMI_BOARD_VENDOR, "TUXEDO"),
> >                         DMI_MATCH(DMI_BOARD_NAME, "L140CU"),
> >                 },
> >         },
> >         { }
> > };
> > 
> > Then it should work with Thomas' system (defaults to deep), and TUXEDO
> > with default settings (defaults to s2idle), and with the rest of the
> > world (I hope at least, fingers crossed). Or you want to pursue a
> > solution without the denylist still? I'm out of ideas what could be
> > wrong except that when the pm_sleep=deep it means the BIOS is involved
> > in the suspend/restore of the devices and it may not expect the OS to
> > touch these registers or something along those lines.
> 
> [I think this refers to "mem_sleep_default=deep" (not pm_sleep)]

Yes.

> I assume everything was fine before suspend, and it only runs hot
> after resume.  And the platform granted control of the PCIe Capability
> and the LTR Capability to OSPM via _OSC?

As David commented already it runs hot even before suspend. The platform
did give LTR capability to the OS:

Jan 02 14:17:52 localhost kernel: acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
Jan 02 14:17:52 localhost kernel: acpi PNP0A08:00: _OSC: OS now controls [PME PCIeCapability LTR]



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux