Bjorn Helgaas <helgaas@xxxxxxxxxx> 於 2024年5月4日 週六 上午6:28寫道: > > [+cc Francisco] > > On Fri, May 03, 2024 at 12:15:49PM -0700, David E. Box wrote: > > On Fri, 2024-05-03 at 17:45 +0800, Jian-Hong Pan wrote: > > > David E. Box <david.e.box@xxxxxxxxxxxxxxx> 於 2024年5月1日 週三 上午2:26寫道: > > > > On Tue, 2024-04-30 at 15:46 +0800, Jian-Hong Pan wrote: > > > > > David E. Box <david.e.box@xxxxxxxxxxxxxxx> 於 2024年4月27日 週六 上午8:03寫道: > > > > > > On Wed, 2024-04-24 at 19:02 +0800, Jian-Hong Pan wrote: > > > > > > > Currently, when enable link's L1.2 features with > > > > > > > __pci_enable_link_state(), > > > > > > > it configs the link directly without ensuring related L1.2 parameters, > > > > > > > such > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have > > > > > > > been > > > > > > > programmed. > > > > > > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device > > > > > > > gets > > > > > > > wrong configs when a caller tries to enabled it. > > > > > > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > > > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor > > > > > > > PCIe > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > > > > ... > > > > > > > Capabilities: [200 v1] L1 PM Substates > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > > > > > > L1_PM_Substates+ > > > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > > > > > > L1SubCtl2: T_PwrOn=50us > > > > > > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue > > > > > > > SN550 > > > > > > > NVMe > > > > > > > SSD (rev 01) (prog-if 02 [NVM Express]) > > > > > > > ... > > > > > > > Capabilities: [900 v1] L1 PM Substates > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > L1_PM_Substates+ > > > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns > > > > > > > L1SubCtl2: T_PwrOn=10us > > > > > > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the > > > > > > > PCIe > > > > > > > Root Port and the child NVMe, they should be programmed with the same > > > > > > > LTR1.2_Threshold value. However, they have different values in this > > > > > > > case. > > > > > > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly > > > > > > > before > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > > > > __pci_enable_link_state(). > > > > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@xxxxxxxxxxxxx> > > > > > > > --- > > > > > > > v2: > > > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > > > > > > > > > > > v3: > > > > > > > - Only enable supported features for the L1 Substates part > > > > > > > > > > > > > > v4: > > > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole > > > > > > > L1SS > > > > > > > > > > > > > > v5: > > > > > > > - Fix typo and commit message > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > > > > > > aspm_get_l1ss_cap()" > > > > > > > > > > > > > > drivers/pci/pcie/aspm.c | 12 ++++++++++++ > > > > > > > 1 file changed, 12 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > > > > > > index c55ac11faa73..553327dee991 100644 > > > > > > > --- a/drivers/pci/pcie/aspm.c > > > > > > > +++ b/drivers/pci/pcie/aspm.c > > > > > > > @@ -1402,6 +1402,8 @@ EXPORT_SYMBOL(pci_disable_link_state); > > > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, > > > > > > > bool > > > > > > > locked) > > > > > > > { > > > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > > > > > > + struct pci_dev *child = link->downstream, *parent = link- > > > > > > > >pdev; > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > > > > > > > > > > > if (!link) > > > > > > > return -EINVAL; > > > > > > > @@ -1433,6 +1435,16 @@ static int __pci_enable_link_state(struct > > > > > > > pci_dev > > > > > > > *pdev, int state, bool locked) > > > > > > > link->aspm_default |= ASPM_STATE_L1_1_PCIPM | > > > > > > > ASPM_STATE_L1; > > > > > > > if (state & PCIE_LINK_STATE_L1_2_PCIPM) > > > > > > > link->aspm_default |= ASPM_STATE_L1_2_PCIPM | > > > > > > > ASPM_STATE_L1; > > > > > > > + /* > > > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, > > > > > > > T_POWER_ON > > > > > > > and > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable > > > > > > > bits > > > > > > > for > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > > > > + */ > > > > > > > + if (state & link->aspm_capable & ASPM_STATE_L1_2_MASK) { > > > > > > > > > > > > This is still mixing PCIE_LINK_STATE flags with ASPM_STATE flags. > > FWIW, Ilpo has removed the ASPM_STATE flags, so eventually this would > have to be updated to apply on the current pci/aspm branch. We're at > rc6 already, so likely this will end up being v6.11 material so you'll > be able to rebase on v6.10-rc1 when it comes out. > > > > > > Thanks for your review, but I notice some description in PCIe spec, > > > > > 5.5.4 L1 PM Substates Configuration: > > > > > "Prior to setting either or both of the enable bits for L1.2, the > > > > > values for TPOWER_ON, Common_Mode_Restore_Time, and, if the ASPM L1.2 > > > > > Enable bit is to be Set, the LTR_L1.2_THRESHOLD (both Value and Scale > > > > > fields) must be programmed." => I think this includes both "ASPM L1.2 > > > > > Enable" and "PCI-PM L1.2 Enable" bits. > > > > > > > > That's fine. While the spec clearly calls out the ASPM L1.2 Enable bit here, > > > > I > > > > see no harm in including PCI-PM L1.2 in that check. This is what the code > > > > already does in aspm_l1ss_init(). > > > > > > > > The issue is the mixed used of two different types of flags that don't have > > > > the > > > > same meaning. 'state' contains PCIE_LINK_STATE flags which are part of the > > > > caller API to the pci_<enabled/disable>_link_state() functions. The > > > > ASPM_STATE > > > > flags are used internally to aspm.c to track all states and their meaningful > > > > combinations such as ASPM_STATE_L1_2_MASK which includes ASPM L1.2 and PCI- > > > > PM > > > > L1.2. You should not do bit operations between them. > > > > > > > > Also, you should not require that the timings be calculated only if L1_2 is > > > > enabled. You should calculate the timings as long as it's capable. This is > > > > also > > > > what aspm_l1ss_init() does. > > > > > > > > The confusion might be over the fact that you are having > > > > __pci_enable_link_state() call aspm_calc_l12_info(). This should have been > > > > handled during initialization of the link in aspm_l1ss_init() and I'm not > > > > sure > > > > why it didn't. Maybe it's because, for VMD, ASPM default state would have > > > > started out all disabled and this somehow led to aspm_l1ss_init() not > > > > getting > > > > called. But looking through the code I don't see it. It would be great if > > > > you > > > > can confirm why they weren't calculated before. > > > > > > I debug it again. If I delete the pci_reset_bus() in vmd controller like: > > > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > > > index 87b7856f375a..39bfda4350bf 100644 > > > --- a/drivers/pci/controller/vmd.c > > > +++ b/drivers/pci/controller/vmd.c > > > @@ -930,25 +930,6 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > unsigned long features) > > > pci_scan_child_bus(vmd->bus); > > > vmd_domain_reset(vmd); > > > > > > - /* When Intel VMD is enabled, the OS does not discover the Root Ports > > > - * owned by Intel VMD within the MMCFG space. pci_reset_bus() applies > > > - * a reset to the parent of the PCI device supplied as argument. This > > > - * is why we pass a child device, so the reset can be triggered at > > > - * the Intel bridge level and propagated to all the children in the > > > - * hierarchy. > > > - */ > > > - list_for_each_entry(child, &vmd->bus->children, node) { > > > - if (!list_empty(&child->devices)) { > > > - dev = list_first_entry(&child->devices, > > > - struct pci_dev, bus_list); > > > - ret = pci_reset_bus(dev); > > > > Hi Nirmal. It's not clear to me from the comment why there's a need to do a bus > > reset. It looks like it is causing misconfiguration of the ASPM L1.2 timings > > which would have been done above in pci_scan_child_bus(). Jian-Hong discovered > > that without the above reset code, the timings are correct. > > I don't understand that comment either. If we don't enumerate the > Root Ports below VMD, it sounds like something is wrong, and reset > doesn't seem like the right fix. > > The reset was added by 0a584655ef89 ("PCI: vmd: Fix secondary bus > reset for Intel bridges") for guest reboots. Maybe Francisco can shed > more light on it. > > > This patch recalculates the timings during the call to pci_enable_link_state() > > which is called during pci_bus_walk() below. Originally I thought the > > recalculation might have been needed by all callers to pci_enabled_link_state() > > since it changes the default BIOS configuration. But it looks like the reset is > > the cause and only the VMD driver would need the recalculation as a result. I > > don't see qcom doing a reset. > > > > Jian-Hong, given this (and assuming the reset is needed) I would not call > > aspm_calc_l12_info() from pci_enable_link_state() but instead try redoing the > > whole ASPM initialization right after the resets are done, maybe by calling > > pci_scan_child_bus() again. What do you think Bjorn? > > I would expect pci_reset_bus() to save and restore config space, but > if we don't enumerate all the devices correctly, I suppose we wouldn't > do that for devices we don't know about. According to the discussion above, the misconfiguration of ASPM L1.2 timings comes from pci_reset_bus() which is added by another issue related to VT-d pass-through: * 6aab5622296b ("PCI: vmd: Clean up domain before enumeration") * 0a584655ef89 ("PCI: vmd: Fix secondary bus reset for Intel bridges") However, I do not quite understand the scenario. Should I drop patches "PCI/ASPM: Introduce aspm_get_l1ss_cap()" and "PCI/ASPM: Fix L1.2 parameters when enable link state", then only send "PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates" and "PCI/ASPM: Add notes about enabling PCI-PM L1SS to pci_enable_link_state(_locked)" first? Jian-Hong Pan > > > - if (ret) > > > - pci_warn(dev, "can't reset device: %d\n", > > > ret); > > > - > > > - break; > > > - } > > > - } > > > - > > > pci_assign_unassigned_bus_resources(vmd->bus); > > > > > > pci_walk_bus(vmd->bus, vmd_pm_enable_quirk, &features); > > > > > > Although PCI-PM_L1.2 is disabled, both PCI bridge and the NVMe's > > > LTR1.2_Threshold are configured as 101376ns: > > > > > > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core > > > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal > > > decode]) > > > ... > > > Capabilities: [200 v1] L1 PM Substates > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > > L1SubCtl2: T_PwrOn=50us > > > > > > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD > > > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) > > > ... > > > Capabilities: [900 v1] L1 PM Substates > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > T_CommonMode=0us LTR1.2_Threshold=101376ns > > > L1SubCtl2: T_PwrOn=50us > > > > > > Then, I apply the patch "PCI: vmd: Set PCI devices to D0 before enable > > > PCI PM's L1 substates". Both PCI bridge and the NVMe's PCI-PM_L1.2 is > > > enabled and LTR1.2_Threshold is configured as 101376ns. > > > > > > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core > > > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal > > > decode]) > > > ... > > > Capabilities: [200 v1] L1 PM Substates > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > > L1SubCtl2: T_PwrOn=50us > > > > > > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD > > > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) > > > ... > > > Capabilities: [900 v1] L1 PM Substates > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > T_CommonMode=0us LTR1.2_Threshold=101376ns > > > L1SubCtl2: T_PwrOn=50us > > > > > > I do not know VMD very much. However, from the test result, it looks > > > like LTR1.2_Threshold has been configured properly originally. But, > > > LTR1.2_Threshold is misconfigured by 'pci_reset_bus()'. > > > ... > > > > > > 'state' should not even matter. > > > > > > The timings should always be calculated and programmed as long > > > > > > as L1_2 is capable. That way the timings are ready even if L1_2 isn't > > > > > > being > > > > > > enabled now (in case the user enables it later). > > > > > > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, > > > > > > > child_l1ss_cap); > > > > > > > + } > > > > > > > pcie_config_aspm_link(link, policy_to_aspm_state(link)); > > > > > > > > > > > > > > link->clkpm_default = (state & PCIE_LINK_STATE_CLKPM) ? 1 : 0; > > > > > > > > > > > >