Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2

Ian Kumlien <ian.kumlien@xxxxxxxxx> · Wed, 23 Sep 2020 23:48:34 +0200



Added dmesg with pci=earlydump on the bugzilla

https://bugzilla.kernel.org/attachment.cgi?id=292601

On Wed, Sep 23, 2020 at 11:36 PM Ian Kumlien <ian.kumlien@xxxxxxxxx> wrote:
>
> On Wed, Sep 23, 2020 at 11:23 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> >
> > On Wed, Sep 23, 2020 at 01:29:18AM +0200, Ian Kumlien wrote:
> > > On Wed, Sep 23, 2020 at 1:00 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > > On Tue, Sep 22, 2020 at 11:02:35PM +0200, Ian Kumlien wrote:
> >
> > > > > commit db3d9c4baf4ab177d87b5cd41f624f5901e7390f
> > > > > Author: Ian Kumlien <ian.kumlien@xxxxxxxxx>
> > > > > Date:   Sun Jul 26 16:01:15 2020 +0200
> > > > >
> > > > >     Use maximum latency when determining L1 ASPM
> > > > >
> > > > >     If it's not, we clear the link for the path that had too large latency.
> > > > >
> > > > >     Currently we check the maximum latency of upstream and downstream
> > > > >     per link, not the maximum for the path
> > > > >
> > > > >     This would work if all links have the same latency, but:
> > > > >     endpoint -> c -> b -> a -> root  (in the order we walk the path)
> > > > >
> > > > >     If c or b has the higest latency, it will not register
> > > > >
> > > > >     Fix this by maintaining the maximum latency value for the path
> > > > >
> > > > >     See this bugzilla for more information:
> > > > >     https://bugzilla.kernel.org/show_bug.cgi?id=208741
> > > > >
> > > > >     This fixes an issue for me where my desktops machines maximum bandwidth
> > > > >     for remote connections dropped from 933 MBit to ~40 MBit.
> > > > >
> > > > >     The bug became obvious once we enabled ASPM on all links:
> > > > >     66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
> > > >
> > > > I can't connect the dots here yet.  I don't see a PCIe-to-PCI/PCI-X
> > > > bridge in your lspci, so I can't figure out why this commit would make
> > > > a difference for you.
> > > >
> > > > IIUC, the problem device is 03:00.0, the Intel I211 NIC.  Here's the
> > > > path to it:
> > > >
> > > >   00:01.2 Root Port              to [bus 01-07]
> > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > >   02:03.0 Switch Downstream Port to [bus 03]
> > > >   03:00.0 Endpoint (Intel I211 NIC)
> > > >
> > > > And I think this is the relevant info:
> > > >
> > > >                                                     LnkCtl    LnkCtl
> > > >            ------DevCap-------  ----LnkCap-------  -Before-  -After--
> > > >   00:01.2                                L1 <32us       L1+       L1-
> > > >   01:00.0                                L1 <32us       L1+       L1-
> > > >   02:03.0                                L1 <32us       L1+       L1+
> > > >   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
> > > >
> > > > The NIC says it can tolerate at most 512ns of L0s exit latency and at
> > > > most 64us of L1 exit latency.
> > > >
> > > > 02:03.0 doesn't support L0s, and the NIC itself can't exit L0s that
> > > > fast anyway (it can only do <2us), so L0s should be out of the picture
> > > > completely.
> > > >
> > > > Before your patch, apparently we (or BIOS) enabled L1 on the link from
> > > > 00:01.2 to 01:00.0, and partially enabled it on the link from 02:03.0
> > > > to 03:00.0.
> > >
> > > According to the spec, this is managed by the OS - which was the
> > > change introduced...
> >
> > BIOS frequently enables ASPM and, if CONFIG_PCIEASPM_DEFAULT=y, I
> > don't think Linux touches it unless a user requests it via sysfs.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=66ff14e59e8a
>
> "7d715a6c1ae5 ("PCI: add PCI Express ASPM support") added the ability for
> Linux to enable ASPM, but for some undocumented reason, it didn't enable
> ASPM on links where the downstream component is a PCIe-to-PCI/PCI-X Bridge.
>
> Remove this exclusion so we can enable ASPM on these links."
>
> > What does "grep ASPM .config" tell you?
>
> CONFIG_PCIEASPM=y
> CONFIG_PCIEASPM_POWERSAVE=y
>
> And all of this worked before the commit above.
>
> > Boot with "pci=earlydump" and the dmesg will tell us what the BIOS
> > did.
> >
> > If you do this in the unmodified kernel:
> >
> >   # echo 1 > /sys/.../0000:03:00.0/l1_aspm
> >
> > it should enable L1 for 03:00.0.  I'd like to know whether it actually
> > does, and whether the NIC behaves any differently when L1 is enabled
> > for the entire path instead of just the first three components.
>
> With an unmodified kernel, I have ~40 mbit bandwidth.
>
> > If the above doesn't work, you should be able to enable ASPM manually:
> >
> >   # setpci -s03:00.0 CAP_EXP+0x10.w=0x0042
>
> I used something similar to disable ASPM, which made the nic work again.
>
> > > > It looks like we *should* be able to enable L1 on both links since the
> > > > exit latency should be <33us (first link starts exit at T=0, completes
> > > > by T=32; second link starts exit at T=1, completes by T=33), and
> > > > 03:00.0 can tolerate up to 64us.
> > > >
> > > > I guess the effect of your patch is to disable L1 on the 00:01.2 -
> > > > 01:00.0 link?  And that makes the NIC work better?  I am obviously
> > > > missing something because I don't understand why the patch does that
> > > > or why it works better.
> > >
> > > It makes it work like normal again, like if i disable ASPM on the
> > > nic itself...
> >
> > I wonder if ASPM is just broken on this device.
> > __e1000e_disable_aspm() mentions hardware errata on a different Intel
> > NIC.
>
> I doubt it. The total time seems to surpass the time it can handle
> with it's buffers.
>
> Note that the nic is running with ASPM now, and it is working =)
>
> > > I don't know which value that reflects, up or down - since we do max
> > > of both values and
> > > it actually disables ASPM.
> > >
> > > What we can see is that the first device that passes the threshold
> > > is 01:00.0
> >
> > I don't understand this.  03:00.0 claims to be able to tolerate 64us
> > of L1 exit latency.  The path should only have <33us of latency, so
> > it's not even close to the 64us threshold.
>
> Well, this is why i dumped the raw data - i don't know how lspci reads it
>
> > > How can I read more data from PCIe without needing to add kprint...
> > >
> > > This is what lspci uses apparently:
> > > #define  PCI_EXP_LNKCAP_L0S     0x07000 /* L0s Exit Latency */
> > > #define  PCI_EXP_LNKCAP_L1      0x38000 /* L1 Exit Latency */
> > >
> > > But which latencies are those? up or down?
> >
> > I think the idea in aspm.c that exit latencies depend on which
> > direction traffic is going is incorrect.  The components at the
> > upstream and downstream ends of the link may have different exit
> > latencies, of course.
>
> Yes, and the max latency is the maximum time the device can buffer before
> the link has to be up, so maximum time to establish link must be
> max(up, down) right?