> On Oct 7, 2020, at 21:30, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Wed, Oct 07, 2020 at 12:26:19PM +0800, Kai-Heng Feng wrote: >>> On Oct 6, 2020, at 03:19, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: >>> On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote: >>>>> On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: >>>>> On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote: > > ... >>>> I wonder whether other devices that add PCIe domain have the same >>>> behavior? Maybe it's not a special case at all... >>> >>> What other devices are these? >> >> Controllers which add PCIe domain. > > I was looking for specific examples, not just a restatement of what > you said before. I'm just curious because there are a lot of > controllers I'm not familiar with, and I can't think of an example. > >>>> I understand the end goal is to keep consistency for the entire ASPM >>>> logic. However I can't think of any possible solution right now. >>>> >>>>> - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the >>>>> SoC power state problem? >>>> >>>> Yes. >>>> >>>>> - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce? >>>> >>>> This will break many systems, at least for the 1st Gen Ryzen >>>> desktops and laptops. >>>> >>>> All PCIe ASPM are not enabled by BIOS, and those systems immediately >>>> freeze once ASPM is enabled. >>> >>> That indicates a defect in the Linux ASPM code. We should fix that. >>> It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system. >> >> On those systems ASPM are also not enabled on Windows. So I think >> ASPM are disabled for a reason. > > If the platform knows ASPM needs to be disabled, it should be using > ACPI_FADT_NO_ASPM or _OSC to prevent the OS from using it. And if > CONFIG_PCIEASPM_POWERSAVE=y means Linux enables ASPM when it > shouldn't, that's a Linux bug that we need to fix. Yes that's a bug which fixed by Ian's new patch. > >>> Are there bug reports for these? The info we would need to start with >>> includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y). >>> If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that >>> might be interesting, too. We'll likely need to add some >>> instrumentation and do some experimentation, but in principle, this >>> should be fixable. >> >> Doing this is asking users to use hardware settings that ODM/OEM >> never tested, and I think the risk is really high. > > What? That's not what I said at all. I'm asking for information > about these hangs so we can fix them. I'm not suggesting that you > should switch to CONFIG_PCIEASPM_POWERSAVE=y for the distro. Ah, I thought your suggestion is switching to CONFIG_PCIEASPM_POWERSAVE=y, because I sense you want to use that to cover the VMD ASPM this patch tries to solve. Do we have a conclusion how to enable VMD ASPM with CONFIG_PCIEASPM_DEFAULT=y? > > Let's back up. You said: > > CONFIG_PCIEASPM_POWERSAVE=y ... will break many systems, at least > for the 1st Gen Ryzen desktops and laptops. > > All PCIe ASPM are not enabled by BIOS, and those systems immediately > freeze once ASPM is enabled. > > These system hangs might be caused by (1) some hardware issue that > causes a hang when ASPM is enabled even if it is configured correctly > or (2) Linux configuring ASPM incorrectly. It's (2) here. > > For case (1), the platform should be using ACPI_FADT_NO_ASPM or _OSC > to prevent the OS from enabling ASPM. Linux should pay attention to > that even when CONFIG_PCIEASPM_POWERSAVE=y. > > If the platform *should* use these mechanisms but doesn't, the > solution is a quirk, not the folklore that "we can't use > CONFIG_PCIEASPM_POWERSAVE=y because it breaks some systems." The platform in question doesn't prevent OS from enabling ASPM. > > For case (2), we should fix Linux so it configures ASPM correctly. > > We cannot use the build-time CONFIG_PCIEASPM settings to avoid these > hangs. We need to fix the Linux run-time code so the system operates > correctly no matter what CONFIG_PCIEASPM setting is used. > > We have sysfs knobs to control ASPM (see 72ea91afbfb0 ("PCI/ASPM: Add > sysfs attributes for controlling ASPM link states")). They can do the > same thing at run-time as CONFIG_PCIEASPM_POWERSAVE=y does at > build-time. If those knobs cause hangs on 1st Gen Ryzen systems, we > need to fix that. Ian's patch solves the issue, at least for the systems I have. Kai-Heng > > Bjorn