On Thu, Jan 28, 2021 at 1:41 PM Ian Kumlien <ian.kumlien@xxxxxxxxx> wrote: > > Sorry about the late reply, been trying to figure out what goes wrong > since this email... > > And yes, I think you're right - the fact that it fixed my system was > basically too good to be true =) So, finally had some time to look at this again... I played some with: diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index ac0557a305af..fdf252eee206 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -392,13 +392,13 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint) while (link) { /* Check upstream direction L0s latency */ - if ((link->aspm_capable & ASPM_STATE_L0S_UP) && - (link->latency_up.l0s > acceptable->l0s)) + if ((link->aspm_capable & ASPM_STATE_L0S_UP) /* && + (link->latency_up.l0s > acceptable->l0s)*/) link->aspm_capable &= ~ASPM_STATE_L0S_UP; /* Check downstream direction L0s latency */ - if ((link->aspm_capable & ASPM_STATE_L0S_DW) && - (link->latency_dw.l0s > acceptable->l0s)) + if ((link->aspm_capable & ASPM_STATE_L0S_DW) /* && + (link->latency_dw.l0s > acceptable->l0s)*/) link->aspm_capable &= ~ASPM_STATE_L0S_DW; /* * Check L1 latency. --- Which does perform better but doesn't solve all the issues... Home machine: Latency: 3.364 ms Download: 640.170 Mbit/s Upload: 918.865 Mbit/s My test server: Latency: 4.549 ms Download: 945.137 Mbit/s Upload: 957.848 Mbit/s But iperf3 still gets bogged down... [ 5] 0.00-1.00 sec 4.66 MBytes 39.0 Mbits/sec 0 82.0 KBytes [ 5] 1.00-2.00 sec 4.60 MBytes 38.6 Mbits/sec 0 79.2 KBytes [ 5] 2.00-3.00 sec 4.47 MBytes 37.5 Mbits/sec 0 56.6 KBytes And with L1 ASPM disabled as usual: [ 5] 0.00-1.00 sec 112 MBytes 938 Mbits/sec 439 911 KBytes [ 5] 1.00-2.00 sec 109 MBytes 912 Mbits/sec 492 888 KBytes [ 5] 2.00-3.00 sec 110 MBytes 923 Mbits/sec 370 1.07 MBytes And just for reference, bredbandskollen again with L1 ASPM disabled: Latency: 2.281 ms Download: 742.136 Mbit/s Upload: 938.053 Mbit/s Anyway, we started to look at the PCIe bridges etc, but i think it's the network card that is at fault, either with advertised latencies (should be lower) or some bug since other cards and peripherals connected to the system works just fine... So, L0s actually seems to have somewhat of an impact - which I found surprising sice both machines are ~6 hops away - however latency differs (measured with tcp) Can we collect L1 ASPM latency numbers for: Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) > On Tue, Jan 12, 2021 at 9:42 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > My guess is the real problem is the Switch is advertising incorrect > > exit latencies. If the Switch advertised "<64us" exit latency for its > > Upstream Port, we'd compute "64us exit latency + 1us Switch delay = > > 65us", which is more than either 03:00.0 or 04:00.0 can tolerate, so > > we would disable L1 on that upstream Link. > > > > Working around this would require some sort of quirk to override the > > values read from the Switch, which is a little messy. Here's what I'm > > thinking (not asking you to do this; just trying to think of an > > approach): > > The question is if it's the device or if it's the bridge... > > Ie, if the device can't quite handle it or if the bridge/switch/port > advertises the wrong latency > I have a friend with a similar motherboard and he has the same latency > values - but his kernel doesn't apply ASPM > > I also want to check L0s since it seems to be enabled... > > > - Configure common clock earlier, in pci_configure_device(), because > > this changes the "read-only" L1 exit latencies in Link > > Capabilities. > > > > - Cache Link Capabilities in the pci_dev. > > > > - Add a quirk to override the cached values for the Switch based on > > Vendor/Device ID and probably DMI motherboard/BIOS info. > > I can't say and I actually think it depends on the actual culprit > which we haven't quite identified yet... > > > Bjorn