Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> 於 2024年9月2日 週一 下午11:44寫道: > > On Mon, 12 Aug 2024, Jian-Hong Pan wrote: > > > Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> 於 2024年8月8日 週四 下午5:49寫道: > > > On Wed, 7 Aug 2024, David E. Box wrote: > > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote: > > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote: > > > > > > > > > > > David E. Box <david.e.box@xxxxxxxxxxxxxxx> 於 2024年8月6日 週二 上午4:26寫道: > > > > > > > > > > > > > > Hi Jian-Hong, > > > > > > > > > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > > > > > > > Jian-Hong Pan <jhp@xxxxxxxxxxxxx> 於 2024年7月19日 週五 下午4:04寫道: > > > > > > > > > > > > > > > > > > Currently, when enable link's L1.2 features with > > > > > > > > > __pci_enable_link_state(), > > > > > > > > > it configs the link directly without ensuring related L1.2 parameters, > > > > > > > > > such > > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have > > > > > > > > > been > > > > > > > > > programmed. > > > > > > > > > > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device > > > > > > > > > gets > > > > > > > > > wrong configs when a caller tries to enabled it. > > > > > > > > > > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > > > > > > > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor > > > > > > > > > PCIe > > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > > > > > > ... > > > > > > > > > Capabilities: [200 v1] L1 PM Substates > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > > > > > > > > L1_PM_Substates+ > > > > > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > > > > > > > > L1SubCtl2: T_PwrOn=50us > > > > > > > > > > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue > > > > > > > > > SN550 > > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > > > > > > > ... > > > > > > > > > Capabilities: [900 v1] L1 PM Substates > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > > L1_PM_Substates+ > > > > > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns > > > > > > > > > L1SubCtl2: T_PwrOn=10us > > > > > > > > > > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the > > > > > > > > > PCIe > > > > > > > > > Root Port and the child NVMe, they should be programmed with the same > > > > > > > > > LTR1.2_Threshold value. However, they have different values in this > > > > > > > > > case. > > > > > > > > > > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly > > > > > > > > > before > > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > > > > > > __pci_enable_link_state(). > > > > > > > > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@xxxxxxxxxxxxx> > > > > > > > > > --- > > > > > > > > > v2: > > > > > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > > > > > > > > > > > > > > > v3: > > > > > > > > > - Only enable supported features for the L1 Substates part > > > > > > > > > > > > > > > > > > v4: > > > > > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole > > > > > > > > > L1SS > > > > > > > > > > > > > > > > > > v5: > > > > > > > > > - Fix typo and commit message > > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > > > > > > > > aspm_get_l1ss_cap()" > > > > > > > > > > > > > > > > > > v6: > > > > > > > > > - Skipped > > > > > > > > > > > > > > > > > > v7: > > > > > > > > > - Pick back and rebase on the new version kernel > > > > > > > > > - Drop the link state flag check. And, always config link state's > > > > > > > > > timing > > > > > > > > > parameters > > > > > > > > > > > > > > > > > > v8: > > > > > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move > > > > > > > > > getting the link's parent and child devices after check the link is > > > > > > > > > not NULL. This avoids NULL memory access. > > > > > > > > > > > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > > > > > > > 1 file changed, 15 insertions(+) > > > > > > > > > > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > > > > > > > > index 5db1044c9895..55ff1d26fcea 100644 > > > > > > > > > --- a/drivers/pci/pcie/aspm.c > > > > > > > > > +++ b/drivers/pci/pcie/aspm.c > > > > > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > > > > > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, > > > > > > > > > bool > > > > > > > > > locked) > > > > > > > > > { > > > > > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > > > > > > + struct pci_dev *parent, *child; > > > > > > > > > > > > > > > > > > if (!link) > > > > > > > > > return -EINVAL; > > > > > > > > > + > > > > > > > > > + parent = link->pdev; > > > > > > > > > + child = link->downstream; > > > > > > > > > + > > > > > > > > > /* > > > > > > > > > * A driver requested that ASPM be enabled on this device, but > > > > > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI > > > > > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct > > > > > > > > > pci_dev > > > > > > > > > *pdev, int state, bool locked) > > > > > > > > > if (!locked) > > > > > > > > > down_read(&pci_bus_sem); > > > > > > > > > mutex_lock(&aspm_lock); > > > > > > > > > + /* > > > > > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, > > > > > > > > > T_POWER_ON and > > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable > > > > > > > > > bits for > > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > > > > > > + */ > > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); > > > > > > > > > > > > > > I still don't think this is the place to recalculate the L1.2 parameters > > > > > > > especially when know the calculation was done but was cleared by > > > > > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after > > > > > > > pci_bus_reset() in vmd.c? > > > > > > > > > > > > I have not thought pci_save/restore_state() around pci_bus_reset() > > > > > > before. It is an interesting direction. > > > > > > > > > > > > So, I prepare modification below for test. Include "[PATCH v8 1/4] > > > > > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates", > > > > > > too. Then, both the PCIe bridge and the PCIe device have the same > > > > > > LTR_L1.2_THRESHOLD 101376ns as expected. > > > > > > > > > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > > > > > > index bbf4a47e7b31..6b8dd4f30127 100644 > > > > > > --- a/drivers/pci/controller/vmd.c > > > > > > +++ b/drivers/pci/controller/vmd.c > > > > > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct > > > > > > pci_host_bridge *root_bridge, > > > > > > vmd_bridge->native_dpc = root_bridge->native_dpc; > > > > > > } > > > > > > > > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata) > > > > > > +{ > > > > > > + pci_save_state(pdev); > > > > > > + return 0; > > > > > > +} > > > > > > + > > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata) > > > > > > +{ > > > > > > + pci_restore_state(pdev); > > > > > > + return 0; > > > > > > +} > > > > > > + > > > > > > /* > > > > > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS. > > > > > > */ > > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > > > > unsigned long features) > > > > > > pci_scan_child_bus(vmd->bus); > > > > > > vmd_domain_reset(vmd); > > > > > > > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); > > > > > > /* When Intel VMD is enabled, the OS does not discover the Root > > > > > > Ports > > > > > > * owned by Intel VMD within the MMCFG space. pci_reset_bus() > > > > > > applies > > > > > > * a reset to the parent of the PCI device supplied as argument. > > > > > > This > > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > > > > unsigned long features) > > > > > > break; > > > > > > } > > > > > > } > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); > > > > > > > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in > > > > > vmd_enable_domain() which preserves state unlike pci_reset_bus()? > > > > > > > > > > (Don't tell me naming of these functions is a horrible mess. :-/) > > > > > > > > Hmm. So this *is* calling pci_reset_bus(). > > > > > > Yeah, I managed to get confused by the names myself, I somehow > > > ended up thinking it calls pci_bus_reset() which is not correct... > > > > > > > L1.2 configuration has specific > > > > ordering requirements for changes to parent & child devices. Could be why it's > > > > not getting restored properly. > > > > > > Indeed, it has to be something else since the patch above doesn't even > > > restore anything because dev->state_saved should get set to false by the > > > first pci_restore_state() called from > > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), I > > > think!? > > > > Inspired by Ilpo's comment. I add some debug messages based on > > linux-next's tag 'next-20240809' to understand the code path of > > pci_reset_bus(): > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index ffaaca0978cb..3ee71374f1de 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) > > * races with ->remove() by the device lock, which must be held by > > * the caller. > > */ > > - if (err_handler && err_handler->reset_prepare) > > + if (err_handler && err_handler->reset_prepare) { > > + pci_info(dev, "%s: %pF\n", __func__, > > err_handler->reset_prepare); > > err_handler->reset_prepare(dev); > > + } > > > > /* > > * Wake-up device prior to save. PM registers default to D0 after > > @@ -5144,6 +5146,7 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) > > pci_set_power_state(dev, PCI_D0); > > > > pci_save_state(dev); > > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__, > > dev->state_saved ? "true" : "false"); > > /* > > * Disable the device by clearing the Command register, except for > > * INTx-disable which is set. This not only disables MMIO and I/O port > > @@ -5655,6 +5658,10 @@ static void > > pci_bus_save_and_disable_locked(struct pci_bus *bus) > > struct pci_dev *dev; > > > > list_for_each_entry(dev, &bus->devices, bus_list) { > > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n", > > + __func__, > > + dev->state_saved ? "true" : "false", > > + dev->subordinate ? "has" : "does not have"); > > pci_dev_save_and_disable(dev); > > if (dev->subordinate) > > pci_bus_save_and_disable_locked(dev->subordinate); > > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct pci_bus *bus) > > struct pci_dev *dev; > > > > list_for_each_entry(dev, &bus->devices, bus_list) { > > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n", > > + __func__, > > + dev->state_saved ? "true" : "false", > > + dev->subordinate ? "has" : "does not have"); > > pci_dev_restore(dev); > > if (dev->subordinate) > > pci_bus_restore_locked(dev->subordinate); > > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe) > > if (!bus->self || !pci_bus_resettable(bus)) > > return -ENOTTY; > > > > - if (probe) > > + if (probe) { > > + pci_info(bus->self, "%s: probe is true. So return 0 > > directly", __func__); > > return 0; > > + } > > > > pci_bus_lock(bus); > > > > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus *bus) > > int rc; > > > > rc = pci_bus_reset(bus, PCI_RESET_PROBE); > > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", __func__, rc); > > if (rc) > > return rc; > > > > if (pci_bus_trylock(bus)) { > > + pci_info(bus->self, "%s: pci_bus_trylock() returns > > true\n", __func__); > > pci_bus_save_and_disable_locked(bus); > > might_sleep(); > > rc = pci_bridge_secondary_bus_reset(bus->self); > > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus) > > */ > > int pci_reset_bus(struct pci_dev *pdev) > > { > > + pci_info(pdev, "%s: %s", __func__, > > !pci_probe_reset_slot(pdev->slot) ? "true" : "false"); > > return (!pci_probe_reset_slot(pdev->slot)) ? > > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus); > > } > > > > And, have the information of VMD PCIe devices with the built kernel: > > > > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core > > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal > > decode]) > > ... > > Capabilities: [200 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=0us LTR1.2_Threshold=0ns > > L1SubCtl2: T_PwrOn=0us > > > > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD > > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) > > ... > > Capabilities: [900 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=0us LTR1.2_Threshold=101376ns > > L1SubCtl2: T_PwrOn=50us > > > > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the > > PCIe bridge has LTR1.2_Threshold=0ns. > > This is now the other way around as in the original posting that had > 0ns for 10000:e1:00.0 ?? > > Is this behavior even consistent or did you e.g. mess up some copy > pasting somewhere? The original posting came with older kernel 6.5. It shows: 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal decode]) ... Capabilities: [200 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us ... 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) ... Capabilities: [900 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us ... Full information: https://gist.github.com/starnight/e19487a44efefff477f9ac9ed641c183 But, newer kernel, for example linux-next next-20240809 and next-20240820 which I have tried shows: 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal decode]) ... Capabilities: [200 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=0us ... 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) ... Capabilities: [900 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us ... Full information: https://gist.github.com/starnight/081ea4adbce40a27faf234e5e135b49a So, according to the information above, different kernel versions show different L1 sub-states. Jian-Hong Pan