On Sat, Jul 25, 2020 at 09:47:05PM +0200, Ian Kumlien wrote: > Hi, > > A while ago I realised that I was having all kinds off issues with my > connection, ~933 mbit had become ~40 mbit > > This only applied on links to the internet (via a linux fw running > NAT) however while debugging with the help of Alexander Duyck > we realised that ASPM could be the culprit (at least disabling ASPM on > the nic it self made things work just fine)... > > So while trying to understand PCIe and such things, I found this: > > The calculations of the max delay looked at "that node" + start latency * "hops" > > But one hop might have a larger latency and break the acceptable delay... > > So after a lot playing around with the code, i ended up with this, and > it seems to fix my problem and does > set two pcie bridges to ASPM Disabled that didn't happen before. > > I do however have questions.... Shouldn't the change be applied to > the endpoint? Or should it be applied recursively along the path to > the endpoint? I don't understand this very well, but I think we do need to consider the latencies along the entire path. PCIe r5.0, sec 5.4.1.3, contains this: Power management software, using the latency information reported by all components in the Hierarchy, can enable the appropriate level of ASPM by comparing exit latency for each given path from Root to Endpoint against the acceptable latency that each corresponding Endpoint can withstand. Also this: 5.4.1.3.1 Software Flow for Enabling or Disabling ASPM Following is an example software algorithm that highlights how to enable or disable ASPM in a component. - PCI Express components power up with an appropriate value in their Slot Clock Configuration bit. The method by which they initialize this bit is device-specific. - PCI Express system software scans the Slot Clock Configuration bit in the components on both ends of each Link to determine if both are using the same reference clock source or reference clocks from separate sources. If the Slot Clock Configuration bits in both devices are Set, they are both using the same reference clock source, otherwise they're not. - PCI Express software updates the Common Clock Configuration bits in the components on both ends of each Link to indicate if those devices share the same reference clock and triggers Link retraining by writing 1b to the Retrain Link bit in the Link Control register of the Upstream component. - Devices must reflect the appropriate L0s/L1 exit latency in their L0s /L1 Exit Latency fields, per the setting of the Common Clock Configuration bit. - PCI Express system software then reads and calculates the L0s/L1 exit latency for each Endpoint based on the latencies reported by each Port. Refer to Section 5.4.1.2.2 for an example. - For each component with one or more Endpoint Functions, PCI Express system software examines the Endpoint L0s/L1 Acceptable Latency, as reported by each Endpoint Function in its Link Capabilities register, and enables or disables L0s/L1 entry (via the ASPM Control field in the Link Control register) accordingly in some or all of the intervening device Ports on that hierarchy. > Also, the L0S checks are only done on the local links, is this > correct? ASPM configuration is done on both ends of a link. I'm not sure it makes sense to enable any state (L0s, L1, L1.1, L1.2) unless both ends of the link support it. In particular, sec 5.4.1.3 says: Software must not enable L0s in either direction on a given Link unless components on both sides of the Link each support L0s; otherwise, the result is undefined. But I think we do need to consider the entire path when enabling L0s; from sec 7.5.3.3: Endpoint L0s Acceptable Latency - This field indicates the acceptable total latency that an Endpoint can withstand due to the transition from L0s state to the L0 state. It is essentially an indirect measure of the Endpoint’s internal buffering. Power management software uses the reported L0s Acceptable Latency number to compare against the L0s exit latencies reported by all components comprising the data path from this Endpoint to the Root Complex Root Port to determine whether ASPM L0s entry can be used with no loss of performance. Does any of that help answer your question? Bjorn > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > index b17e5ffd31b1..bd53fba7f382 100644 > --- a/drivers/pci/pcie/aspm.c > +++ b/drivers/pci/pcie/aspm.c > @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev, > > static void pcie_aspm_check_latency(struct pci_dev *endpoint) > { > - u32 latency, l1_switch_latency = 0; > + u32 latency, l1_max_latency = 0, l1_switch_latency = 0; > struct aspm_latency *acceptable; > struct pcie_link_state *link; > > @@ -470,8 +470,9 @@ static void pcie_aspm_check_latency(struct pci_dev > *endpoint) > * substate latencies (and hence do not do any check). > */ > latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1); > + l1_max_latency = max_t(u32, latency, l1_max_latency); > if ((link->aspm_capable & ASPM_STATE_L1) && > - (latency + l1_switch_latency > acceptable->l1)) > + (l1_max_latency + l1_switch_latency > acceptable->l1)) > link->aspm_capable &= ~ASPM_STATE_L1; > l1_switch_latency += 1000;