On Fri, 2024-02-02 at 18:05 -0600, Bjorn Helgaas wrote: > On Fri, Feb 02, 2024 at 03:11:12PM +0800, Jian-Hong Pan wrote: > > The remapped PCIe Root Port and NVMe have PCI PM L1 substates > > capability, but they are disabled originally: > > > > Here is an example on ASUS B1400CEAE: > > > > Capabilities: [900 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > L1_PM_Substates+ > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=0us LTR1.2_Threshold=0ns > > L1SubCtl2: T_PwrOn=10us > > > > Power on all of the VMD remapped PCI devices and quirk max snoop LTR > > before enable PCI-PM L1 PM Substates by following "Section 5.5.4 of PCIe > > Base Spec Revision 6.0". Then, PCI PM's L1 substates control are > > initialized & enabled accordingly. > > > Also, update the comments of > > pci_enable_link_state() and pci_enable_link_state_locked() for > > kernel-doc, too. > > The aspm.c changes should be in a separate patch. Presumably the > aspm.c code change fixes a defect, and that defect can be described in > that patch. That fix may be needed for non-VMD situations, and having > it in this vmd patch means it won't be as easy to find and backport. > > Nit: rewrap commit log to fill 75 columns. > > > @@ -751,11 +751,9 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, > > void *userdata) > > if (!(features & VMD_FEAT_BIOS_PM_QUIRK)) > > return 0; > > > > - pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL); > > - > > pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR); > > if (!pos) > > - return 0; > > + goto out_enable_link_state; > > > > /* > > * Skip if the max snoop LTR is non-zero, indicating BIOS has set > > it > > @@ -763,7 +761,7 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, > > void *userdata) > > */ > > pci_read_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, <r_reg); > > if (!!(ltr_reg & (PCI_LTR_VALUE_MASK | PCI_LTR_SCALE_MASK))) > > - return 0; > > + goto out_enable_link_state; > > > > /* > > * Set the default values to the maximum required by the platform > > to > > @@ -775,6 +773,14 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, > > void *userdata) > > pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, ltr_reg); > > pci_info(pdev, "VMD: Default LTR value set by driver\n"); > > You're not changing this part, and I don't understand exactly how LTR > works, but it makes me a little bit queasy to read "set the LTR value > to the maximum required to allow the deepest power management > savings" and then we set the max snoop values to a fixed constant. > > I don't think the goal is to "allow the deepest power savings"; I > think it's to enable L1.2 *when the device has enough buffering to > absorb L1.2 entry/exit latencies*. > > The spec (PCIe r6.0, sec 7.8.2.2) says "Software should set this to > the platform's maximum supported latency or less," so it seems like > that value must be platform-dependent, not fixed. > > And I assume the "_DSM for Latency Tolerance Reporting" is part of the > way to get those platform-dependent values, but Linux doesn't actually > use that yet. This may indeed be the best way but we need to double check with our BIOS folks. AFAIK BIOS writes the LTR values directly so there hasn't been a need to use this _DSM. But under VMD the ports are hidden from BIOS which is why we added it here. I've brought up the question internally to find out how Windows handles the DSM and to get a recommendation from our firmware leads. > > I assume that setting the max values incorrectly may lead to either > being too conservative, so we don't use L1.2 when we could, or too > aggressive, so we use L1.2 when we shouldn't, and the device loses > data because it doesn't have enough internal buffering to absorb the > entry/exit delays. > > This paper has a lot of background and might help answer some of my > questions: > https://www.intel.co.za/content/dam/doc/white-paper/energy-efficient-platforms-white-paper.pdf > > > +out_enable_link_state: > > + /* > > + * Make PCI devices at D0 when enable PCI-PM L1 PM Substates from > > + * Section 5.5.4 of PCIe Base Spec Revision 6.0 > > + */ > > + pci_set_power_state_locked(pdev, PCI_D0); > > + pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL); > > Hmmm. PCIE_LINK_STATE_ALL includes ASPM L1.2. We may do this even if > the device doesn't have an LTR Capability. ASPM L1.2 cannot work > without LTR. > > I only took a quick look but was not convinced that > pci_enable_link_state() does the right thing when we enable > PCIE_LINK_STATE_ALL (including ASPM L1.2) on a device that doesn't > have LTR. I think it *should* decline to set PCI_L1SS_CTL1_ASPM_L1_2, > but I'm not sure it does. Can you double check that path? Maybe > that's another defect in aspm.c. It doesn't currently decline. The same scenario happens when the user selects powersupersave. If a device advertises L1.2 with the capable registers set, it should also have the LTR register present. But it doesn't hurt to check. David > > > @@ -1164,6 +1164,8 @@ static int __pci_enable_link_state(struct pci_dev > > *pdev, int state, bool locked) > > link->aspm_default |= ASPM_STATE_L1_1_PCIPM | > > ASPM_STATE_L1; > > if (state & PCIE_LINK_STATE_L1_2_PCIPM) > > link->aspm_default |= ASPM_STATE_L1_2_PCIPM | > > ASPM_STATE_L1; > > + if (state & ASPM_STATE_L1_2_MASK) > > + aspm_l1ss_init(link); > > pcie_config_aspm_link(link, policy_to_aspm_state(link)); > > > > link->clkpm_default = (state & PCIE_LINK_STATE_CLKPM) ? 1 : 0;