> On Oct 6, 2020, at 03:19, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > [+cc Ian, who's also working on an ASPM issue] > > On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote: >>> On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: >>> On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote: >>>> BIOS may not be able to program ASPM for links behind VMD, prevent Intel >>>> SoC from entering deeper power saving state. >>> >>> It's not a question of BIOS not being *able* to configure ASPM. I >>> think BIOS could do it, at least in principle, if it had a driver for >>> VMD. Actually, it probably *does* include some sort of VMD code >>> because it sounds like BIOS can assign some Root Ports to appear >>> either as regular Root Ports or behind the VMD. >>> >>> Since this issue is directly related to the unusual VMD topology, I >>> think it would be worth a quick recap here. Maybe something like: >>> >>> VMD is a Root Complex Integrated Endpoint that acts as a host bridge >>> to a secondary PCIe domain. BIOS can reassign one or more Root >>> Ports to appear within a VMD domain instead of the primary domain. >>> >>> However, BIOS may not enable ASPM for the hierarchies behind a VMD, >>> ... >>> >>> (This is based on the commit log from 185a383ada2e ("x86/PCI: Add >>> driver for Intel Volume Management Device (VMD)")). >> >> Ok, will just copy the portion as-is if there's patch v2 :) >> >>> But we still have the problem that CONFIG_PCIEASPM_DEFAULT=y means >>> "use the BIOS defaults", and this patch would make it so we use the >>> BIOS defaults *except* for things behind VMD. >>> >>> - Why should VMD be a special case? >> >> Because BIOS doesn't handle ASPM for it so it's up to software to do >> the job. In the meantime we want other devices still use the BIOS >> defaults to not introduce any regression. >> >>> - How would we document such a special case? >> >> I wonder whether other devices that add PCIe domain have the same >> behavior? Maybe it's not a special case at all... > > What other devices are these? Controllers which add PCIe domain. > >> I understand the end goal is to keep consistency for the entire ASPM >> logic. However I can't think of any possible solution right now. >> >>> - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the >>> SoC power state problem? >> >> Yes. >> >>> - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce? >> >> This will break many systems, at least for the 1st Gen Ryzen >> desktops and laptops. >> >> All PCIe ASPM are not enabled by BIOS, and those systems immediately >> freeze once ASPM is enabled. > > That indicates a defect in the Linux ASPM code. We should fix that. > It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system. On those systems ASPM are also not enabled on Windows. So I think ASPM are disabled for a reason. > > Are there bug reports for these? The info we would need to start with > includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y). > If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that > might be interesting, too. We'll likely need to add some > instrumentation and do some experimentation, but in principle, this > should be fixable. Doing this is asking users to use hardware settings that ODM/OEM never tested, and I think the risk is really high. Kai-Heng > > Bjorn