Re: My AlderLake Dell (XPS-9320) needs these patches to get full standby/low-power modes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mika, Bjorn,

On Tue, 2023-11-07 at 13:15 +0200, Mika Westerberg wrote:
> Hi,
> 
> On Mon, Nov 06, 2023 at 12:11:07PM -0600, Bjorn Helgaas wrote:
> > [+cc Mika, Sathy, Rafael, David, Ilpo, Ricky, Mario, linux-pci]
> > 
> > On Sat, Nov 04, 2023 at 10:13:24AM -0700, Kenneth R. Crudup wrote:
> > > 
> > > I have a Dell XPS-9320 with an Alderlake chipset, and the NVMe behind a
> > > VMD device:
> > > 
> > > ----
> > > [    0.127342] smpboot: CPU0: 12th Gen Intel(R) Core(TM) i7-1280P (family:
> > > 0x6, model: 0x9a, stepping: 0x3)
> > > ----
> > > 0000:00:0e.0 0104: 8086:467f
> > >         Subsystem: 1028:0af3
> > >         Flags: bus master, fast devsel, latency 0, IOMMU group 9
> > >         Memory at 603c000000 (64-bit, non-prefetchable) [size=32M]
> > >         Memory at 72000000 (32-bit, non-prefetchable) [size=32M]
> > > a7152be79b6        Memory at 6040100000 (64-bit, non-prefetchable)
> > > [size=1M]
> > >         Capabilities: <access denied>
> > >         Kernel driver in use: vmd
> > > ----
> > > 
> > > The only release kernel that was able to get this laptop to fully get into
> > > low-power (unfortunately only s0ix) was the Ubuntu-6.2.0- ... series from
> > > Ubuntu
> > > (remote git://git.launchpad.net/~ubuntu-
> > > kernel/ubuntu/+source/linux/+git/lunar).
> > > 
> > > I'd bisected it to the following commits (in this order):
> > > 
> > > 4ff116d0d5fd PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
> > > 5e85eba6f50d PCI/ASPM: Refactor L1 PM Substates Control Register
> > > programming
> > > 1a0102a08f20 UBUNTU: SAUCE: PCI/ASPM: Enable ASPM for links under VMD
> > > domain
> > > 47c7bfd31514 UBUNTU: SAUCE: PCI/ASPM: Enable LTR for endpoints behind VMD
> > > 154d48da2c57 UBUNTU: SAUCE: vmd: fixup bridge ASPM by driver name instead
> > 
> > Thanks for these.  You don't happen to have URLs for those Ubuntu
> > commits, do you?  E.g., https://git.kernel.org/linus/4ff116d0d5fd
> > (which was reverted by a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM
> > Substates Capability for suspend/resume"")).
> > 
> > > Without the patches I never see Pkg%PC8 or higher(? lower?), nor i915
> > > states
> > > DC5/6, all necssary for SYS%LPI/CPU%LPI. I've attached a little script I
> > > use
> > > alongside turbostat for verifying low-power operation (and also for seeing
> > > what chipset subsystem may be preventing it).
> > > 
> > > The first two are in Linus' trees, but were reverted (4ff116d0d5fd in
> > > a7152be79b6, 5e85eba6f50d in ff209ecc376a). The last three come from
> > > Ubuntu's
> > > Linux trees (see remote spec above). The first two remain reverted in the
> > > Ubuntu trees, but if I put them back, I get increased power savings during
> > > suspend/resume cycles.
> > > 
> > > Considering the power draw is really significant without these patches
> > > (10s
> > > of %s per hour) and I'd think Dell would have sold some decent number of
> > > these laptops, I'd been patiently waiting for these patches, or some
> > > variant
> > > to show up in the stable trees, but so far I'm up to the 6.6 stable kernel
> > > and still having to manually cherry-pick these, so I thought maybe I could
> > > bring this to the PM maintainers' attention so at least start a discussion
> > > about this issue.
> > 
> > Thank you very much for raising this again.  We really need to make
> > some progress, and Mika recently posted a patch to add the
> > 4ff116d0d5fd functionality again:
> > https://lore.kernel.org/r/20231002070044.2299644-1-mika.westerberg@xxxxxxxxxxxxxxx
> > 
> > The big problem is that it works on *most* systems, but it still seems
> > to break a few.  So Mika's current patch relies on a denylist of
> > systems where we *don't* restore the substates.
> 
> According to latest reports it is just that one system where this is
> still an issue. The latest patch works in Asus UX305FA even if it is not
> in the denylist. That would leave that one system only to the denylist,
> at least the ones we are aware about.

I've been working with Thomas, whose system is the last known to have problems
with Mika's patch. It turns out that his config sets aspm_policy to 'powersave'.
If he sets it to any other policy, Mika's patch works [1]. It's possible that
others may see the same issue if they use 'powersave' as well.

The theory right now is that enabling L1SS in pci_restore_state() is too early.
During boot, if ASPM policy is 'powersave' or 'powersupersave', ASPM enabling is
deferred. The comment in pcie_aspm_init_link_state() that skips it state that:

        /*
         * At this stage drivers haven't had an opportunity to change the
         * link policy setting. Enabling ASPM on broken hardware can cripple
         * it even before the driver has had a chance to disable ASPM, so
         * default to a safe level right now. If we're enabling ASPM beyond
         * the BIOS's expectation, we'll do so once pci_enable_device() is
         * called.
         */

While pci_enable_device() is called by the PCI core before pci_restore_state()
on resume, it is called again later by the nvme driver in nvme_pci_enable().
This stage seems the intended intercept mentioned in the comment. This ends up
calling pcie_aspm_powersave_config_link() to configure ASPM at that time. During
boot we see ASPM enabling is indeed happening for powersave during
nvme_pci_enable(). With the save/restore patch however it is being restored
before nvme_pci_enable(). I've asked Thomas not to apply Mika's patch, but
instead use a different patch [2] that waits until
pcie_aspm_powersave_config_link() is called to configure ASPM. The need for this
is mentioned below. Hopefully it will fix the hang observed on his system.

Whether that patch works for him, we can address his problem with the current
L1SS save/restore patch by removing the current denylist and instead only do the
save/restore if ASPM policy is 'default' which doesn't hang his system. This
makes sense since it's only the BIOS config that we care to preserve since it
can be lost during suspend, particularly during s2idle. All other policies are
OS controlled if allowed. Instead of save/restore for those we can let it be
configured later when pcie_aspm_powersave_config_link() is called.

The only issue with this is that pcie_aspm_powersave_config_link() will not
configure ASPM if aspm_policy has not changed. This is a problem because we
observed that after resume from S3, BIOS has reenabled L1SS. So we can boot with
powersave (which disables L1SS) but resume with L1SS enabled and policy still
set to powersave. This is a preexisting bug. I've observed this behavior on
Thomas's system and with mainline on our desktop systems. This is the reason for
patch [2]. It will force ASPM to be configured in
pcie_aspm_powersave_config_link() even if the policy is the same. It works on my
system. I'm hoping that it will work on his system to resume successfully with
the correct policy enabled.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=216877#c33
[2] https://bugzilla.kernel.org/attachment.cgi?id=305395&action=diff

David




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux