On Thu, 26 Sep 2024, Jian-Hong Pan wrote: > David E. Box <david.e.box@xxxxxxxxxxxxxxx> 於 2024年9月26日 週四 上午10:51寫道: > > > > Hi Jian-Hong, > > > > On Tue, 2024-09-24 at 15:29 +0800, Jian-Hong Pan wrote: > > > PCI devices' parameters on the VMD bus have been programmed properly > > > originally. But, cleared after pci_reset_bus() and have not been restored > > > correctly. This leads the link's L1.2 between PCIe Root Port and child > > > device gets wrong configs. > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD. Both PCIe > > > bridge and NVMe device should have the same LTR1.2_Threshold value. > > > However, they are configured as different values in this case: > > > > > > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core Processor > > > PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal decode]) > > > ... > > > Capabilities: [200 v1] L1 PM Substates > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > T_CommonMode=0us LTR1.2_Threshold=0ns > > > L1SubCtl2: T_PwrOn=0us > > > > > > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Blue > > > SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) > > > ... > > > Capabilities: [900 v1] L1 PM Substates > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > T_CommonMode=0us LTR1.2_Threshold=101376ns > > > L1SubCtl2: T_PwrOn=50us > > > > > > Here is VMD mapped PCI device tree: > > > > > > -+-[0000:00]-+-00.0 Intel Corporation Device 9a04 > > > | ... > > > \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD > > > \-17.0 Intel Corporation Tiger Lake-LP SATA Controller > > > > > > When pci_reset_bus() resets the bus [e1] of the NVMe, it only saves and > > > restores NVMe's state before and after reset. Because bus [e1] has only one > > > device: 10000:e1:00.0 NVMe. The PCIe bridge is missed. However, when it > > > restores the NVMe's state, it also restores the ASPM L1SS between the PCIe > > > bridge and the NVMe by pci_restore_aspm_l1ss_state(). The NVMe's L1SS is > > > restored correctly. But, the PCIe bridge's L1SS is restored with the wrong > > > value 0x0. Becuase, the PCIe bridge's state is not saved before reset. > > > That is why pci_restore_aspm_l1ss_state() gets the parent device (PCIe > > > bridge)'s saved state capability data and restores L1SS with value 0. > > > > > > So, save the PCIe bridge's state before pci_reset_bus(). Then, restore the > > > state after pci_reset_bus(). The restoring state also consumes the saving > > > state. > > > > > > Link: https://www.spinics.net/lists/linux-pci/msg159267.html > > > Signed-off-by: Jian-Hong Pan <jhp@xxxxxxxxxxxxx> > > > --- > > > v9: > > > - Drop the v8 fix about drivers/pci/pcie/aspm.c. Use this in VMD instead. > > > > > > drivers/pci/controller/vmd.c | 7 +++++++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > > > index 11870d1fc818..2820005165b4 100644 > > > --- a/drivers/pci/controller/vmd.c > > > +++ b/drivers/pci/controller/vmd.c > > > @@ -938,9 +938,16 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > unsigned long features) > > > if (!list_empty(&child->devices)) { > > > dev = list_first_entry(&child->devices, > > > struct pci_dev, bus_list); > > > > Newline here > > > + /* To avoid pci_reset_bus restore wrong ASPM L1SS > > > + * configuration due to missed saving parent device's > > > + * states, save & restore the parent device's states > > > + * as well. > > > + */ > > > > No text on first line of comment. > > Oops! Thanks > > > /* > > * To avoid pci_reset_bus restore wrong ASPM L1SS > > * ... > > */ > > > > > + pci_save_state(dev->bus->self); > > > ret = pci_reset_bus(dev); > > > if (ret) > > > pci_warn(dev, "can't reset device: %d\n", > > > ret); > > > + pci_restore_state(dev->bus->self); > > > > I think Ilpo's point was that pci_save_aspm_l1ss_state() and > > pci_restore_aspm_l1ss_state() should be more symmetric. If > > pci_save_aspm_l1ss_state() is changed to also handle the state for the device > > and its parent in the same call, then no change is needed here. But that would > > only handle L1SS settings and not anything else that might need to be restored > > after the bus reset. So I'm okay with it either way. Why would something else need to be restored? The spec explicitly says the bus reset should not cause config changes on the parent other than to status bits. Based on what is seen so far, the problem here is not the reset itself breaking parent's config but ASPM code restoring values from state beyond what it had saved and thus it programs pseudogarbage into the L1SS settings. > Yes, that made me think the whole day before I sent out this patch. I > do not know what is the best reset bus procedure, especially how other > drivers reset the bus. > > If trace the code path: > pci_reset_bus() > __pci_reset_bus() > pci_bus_reset() > pci_bus_save_and_disable_locked() > > pci_bus_save_and_disable_locked()'s comment shows "Save and disable > devices from the top of the tree down while holding the @dev mutex > lock for the entire tree". I think that means the PCI(e) bridge should > save state first, then the following bus' devices. However, VMD resets > the child device (10000:e1:00.0 NVMe)'s bus first and only saves the > child device (10000:e1:00.0 NVMe)'s state. It does not visit the tree > from the top to down. So, it misses the PCIe bridge. Therefore, I > make it save & restore its parent (10000:e0:06.0 PCIe bridge)'s state > as compensation. The problem with your fix is it only takes care of part of the cases (i.e. VMD). But there are other callers of pci_reset_bus() which would also restore incorrect L1SS settings, no? I'd suggest making the L1SS code symmetric on this, even if it means saving the register value twice when walking the tree downwards (it seems harmless to write the same value twice). -- i.