Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2

Ian Kumlien <ian.kumlien@xxxxxxxxx> · Tue, 29 Sep 2020 14:51:48 +0200

On Tue, Sep 29, 2020 at 1:31 AM Alexander Duyck
<alexander.duyck@xxxxxxxxx> wrote:
>
> On Mon, Sep 28, 2020 at 1:33 PM Ian Kumlien <ian.kumlien@xxxxxxxxx> wrote:
> >
> > On Mon, Sep 28, 2020 at 10:04 PM Ian Kumlien <ian.kumlien@xxxxxxxxx> wrote:
> > >
> > > On Mon, Sep 28, 2020 at 9:53 PM Alexander Duyck
> > > <alexander.duyck@xxxxxxxxx> wrote:
>
> <snip>
>
> > > > You should be able to manually disable L1 on the realtek link
> > > > (4:00.0<->2:04.0) instead of doing it on the upstream link on the
> > > > switch. That may provide a datapoint on the L1 behavior of the setup.
> > > > Basically if you took the realtek out of the equation in terms of the
> > > > L1 exit time you should see the exit time drop to no more than 33us
> > > > like what would be expected with just the i210.
> > >
> > > Yeah, will try it out with echo 0 >
> > > /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/0000:02:04.0/0000:04:00.0/link/l1_aspm
> > > (which is the device reported by my patch)
> >
> > So, 04:00.0 is already disabled, the existing code apparently handled
> > that correctly... *but*
> >
> > given the path:
> > 00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek
> > Semiconductor Co., Ltd. Device 816e (rev 1a)
> >
> > Walking backwards:
> > -- 04:00.0 has l1 disabled
> > -- 02:04.0 doesn't have aspm?!
> >
> > lspci reports:
> > Capabilities: [370 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
> > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> > L1SubCtl2:
> > Capabilities: [400 v1] Data Link Feature <?>
> > Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
> > Capabilities: [440 v1] Lane Margining at the Receiver <?>
> >
> > However the link directory is empty.
> >
> > Anything we should know about these unknown capabilities? also aspm
> > L1.1 and .1.2, heh =)
> >
> > -- 01:00.0 has L1, disabling it makes the intel nic work again
>
> I recall that much. However the question is why? If there is already a
> 32us time to bring up the link between the NIC and the switch why
> would the additional 1us to also bring up the upstream port have that
> much of an effect? That is why I am thinking that it may be worthwhile
> to try to isolate things further so that only the upstream port and
> the NIC have L1 enabled. If we are still seeing issues in that state
> then I can only assume there is something off with the
> 00:01.2<->1:00.0 link to where it either isn't advertising the actual
> L1 recovery time. For example the "Width x4 (downgraded)" looks very
> suspicious and could be responsible for something like that if the
> link training is having to go through exception cases to work out the
> x4 link instead of a x8.

It is a x4 link, all links that aren't "fully populated" or "fully
utilized" are listed as downgraded...

So, x16 card in x8 slot or pcie 3 card in pcie 2 slot - all lists as downgraded

> > ASPM L1 enabled:
> > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > [  5]   0.00-1.00   sec  5.40 MBytes  45.3 Mbits/sec    0   62.2 KBytes
> > [  5]   1.00-2.00   sec  4.47 MBytes  37.5 Mbits/sec    0   70.7 KBytes
> > [  5]   2.00-3.00   sec  4.10 MBytes  34.4 Mbits/sec    0   42.4 KBytes
> > [  5]   3.00-4.00   sec  4.47 MBytes  37.5 Mbits/sec    0   65.0 KBytes
> > [  5]   4.00-5.00   sec  4.47 MBytes  37.5 Mbits/sec    0    105 KBytes
> > [  5]   5.00-6.00   sec  4.47 MBytes  37.5 Mbits/sec    0   84.8 KBytes
> > [  5]   6.00-7.00   sec  4.47 MBytes  37.5 Mbits/sec    0   65.0 KBytes
> > [  5]   7.00-8.00   sec  4.10 MBytes  34.4 Mbits/sec    0   45.2 KBytes
> > [  5]   8.00-9.00   sec  4.47 MBytes  37.5 Mbits/sec    0   56.6 KBytes
> > [  5]   9.00-10.00  sec  4.47 MBytes  37.5 Mbits/sec    0   48.1 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  44.9 MBytes  37.7 Mbits/sec    0             sender
> > [  5]   0.00-10.01  sec  44.0 MBytes  36.9 Mbits/sec                  receiver
> >
> > ASPM L1 disabled:
> > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > [  5]   0.00-1.00   sec   111 MBytes   935 Mbits/sec  733    761 KBytes
> > [  5]   1.00-2.00   sec   110 MBytes   923 Mbits/sec  733    662 KBytes
> > [  5]   2.00-3.00   sec   109 MBytes   912 Mbits/sec  1036   1.20 MBytes
> > [  5]   3.00-4.00   sec   109 MBytes   912 Mbits/sec  647    738 KBytes
> > [  5]   4.00-5.00   sec   110 MBytes   923 Mbits/sec  852    744 KBytes
> > [  5]   5.00-6.00   sec   109 MBytes   912 Mbits/sec  546    908 KBytes
> > [  5]   6.00-7.00   sec   109 MBytes   912 Mbits/sec  303    727 KBytes
> > [  5]   7.00-8.00   sec   109 MBytes   912 Mbits/sec  432    769 KBytes
> > [  5]   8.00-9.00   sec   110 MBytes   923 Mbits/sec  462    652 KBytes
> > [  5]   9.00-10.00  sec   109 MBytes   912 Mbits/sec  576    764 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  1.07 GBytes   918 Mbits/sec  6320             sender
> > [  5]   0.00-10.01  sec  1.06 GBytes   912 Mbits/sec                  receiver
> >
> > (all measurements are over live internet - so thus variance)
>
> I forgot there were 5 total devices that were hanging off of there as
> well. You might try checking to see if disabling L1 on devices 5:00.0,
> 6:00.0 and/or 7:00.0 has any effect while leaving the L1 on 01:00.0
> and the NIC active. The basic idea is to go through and make certain
> we aren't seeing an L1 issue with one of the other downstream links on
> the switch.

I did, and i saw no change, only disabling L1 on 01:00.0 gives any effect.
But i'd say you're right in your thinking - with L0s head-of-queue
stalling can happen
due to retry buffers and so on, was interesting to see it detailed...

> The more I think about it the entire setup for this does seem a bit
> suspicious. I was looking over the lspci tree and the dump from the
> system. From what I can tell the upstream switch link at 01.2 <->
> 1:00.0 is only a Gen4 x4 link. However coming off of that is 5
> devices, two NICs using either Gen1 or 2 at x1, and then a USB
> controller and 2 SATA controller reporting Gen 4 x16. Specifically
> those last 3 devices have me a bit curious as they are all reporting
> L0s and L1 exit latencies that are the absolute minimum which has me
> wondering if they are even reporting actual values.

Heh, I have been trying to google for erratas wrt to:
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch
Upstream aka 1022:57ad

and the cpu, to see if there is something else I could have missed,
but i haven't found anything relating to this yet...