Re: [PATCH 1/3] PCI/ASPM: Use the path max in L1 ASPM latency check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 28, 2021 at 1:41 PM Ian Kumlien <ian.kumlien@xxxxxxxxx> wrote:
>
> Sorry about the late reply, been trying to figure out what goes wrong
> since this email...
>
> And yes, I think you're right - the fact that it fixed my system was
> basically too good to be true =)

So, finally had some time to look at this again...

I played some with:
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index ac0557a305af..fdf252eee206 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -392,13 +392,13 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)

        while (link) {
                /* Check upstream direction L0s latency */
-               if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
-                   (link->latency_up.l0s > acceptable->l0s))
+               if ((link->aspm_capable & ASPM_STATE_L0S_UP) /* &&
+                   (link->latency_up.l0s > acceptable->l0s)*/)
                        link->aspm_capable &= ~ASPM_STATE_L0S_UP;

                /* Check downstream direction L0s latency */
-               if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
-                   (link->latency_dw.l0s > acceptable->l0s))
+               if ((link->aspm_capable & ASPM_STATE_L0S_DW) /* &&
+                   (link->latency_dw.l0s > acceptable->l0s)*/)
                        link->aspm_capable &= ~ASPM_STATE_L0S_DW;
                /*
                 * Check L1 latency.
---

Which does perform better but doesn't solve all the issues...

Home machine:
Latency:       3.364 ms
Download:    640.170 Mbit/s
Upload:      918.865 Mbit/s

My test server:
Latency:       4.549 ms
Download:    945.137 Mbit/s
Upload:      957.848 Mbit/s

But iperf3 still gets bogged down...
[  5]   0.00-1.00   sec  4.66 MBytes  39.0 Mbits/sec    0   82.0 KBytes
[  5]   1.00-2.00   sec  4.60 MBytes  38.6 Mbits/sec    0   79.2 KBytes
[  5]   2.00-3.00   sec  4.47 MBytes  37.5 Mbits/sec    0   56.6 KBytes

And with L1 ASPM disabled as usual:
[  5]   0.00-1.00   sec   112 MBytes   938 Mbits/sec  439    911 KBytes
[  5]   1.00-2.00   sec   109 MBytes   912 Mbits/sec  492    888 KBytes
[  5]   2.00-3.00   sec   110 MBytes   923 Mbits/sec  370   1.07 MBytes

And just for reference, bredbandskollen again with L1 ASPM disabled:
Latency:       2.281 ms
Download:    742.136 Mbit/s
Upload:      938.053 Mbit/s

Anyway, we started to look at the PCIe bridges etc, but i think it's
the network card
that is at fault, either with advertised latencies (should be lower)
or some bug since
other cards and peripherals connected to the system works just fine...

So, L0s actually seems to have somewhat of an impact - which I found
surprising sice
both machines are ~6 hops away - however latency differs (measured with tcp)

Can we collect L1 ASPM latency numbers for:
Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)

> On Tue, Jan 12, 2021 at 9:42 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > My guess is the real problem is the Switch is advertising incorrect
> > exit latencies.  If the Switch advertised "<64us" exit latency for its
> > Upstream Port, we'd compute "64us exit latency + 1us Switch delay =
> > 65us", which is more than either 03:00.0 or 04:00.0 can tolerate, so
> > we would disable L1 on that upstream Link.
> >
> > Working around this would require some sort of quirk to override the
> > values read from the Switch, which is a little messy.  Here's what I'm
> > thinking (not asking you to do this; just trying to think of an
> > approach):
>
> The question is if it's the device or if it's the bridge...
>
> Ie, if the device can't quite handle it or if the bridge/switch/port
> advertises the wrong latency
> I have a friend with a similar motherboard and he has the same latency
> values - but his kernel doesn't apply ASPM
>
> I also want to check L0s since it seems to be enabled...
>
> >   - Configure common clock earlier, in pci_configure_device(), because
> >     this changes the "read-only" L1 exit latencies in Link
> >     Capabilities.
> >
> >   - Cache Link Capabilities in the pci_dev.
> >
> >   - Add a quirk to override the cached values for the Switch based on
> >     Vendor/Device ID and probably DMI motherboard/BIOS info.
>
> I can't say and I actually think it depends on the actual culprit
> which we haven't quite identified yet...
>
> > Bjorn



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux