Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 28, 2020 at 2:06 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Sat, Sep 26, 2020 at 12:26:53AM +0200, Ian Kumlien wrote:
> > On Fri, Sep 25, 2020 at 5:49 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > On Fri, Sep 25, 2020 at 03:54:11PM +0200, Ian Kumlien wrote:
> > > > On Fri, Sep 25, 2020 at 3:39 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Sep 25, 2020 at 12:18:50PM +0200, Ian Kumlien wrote:
> > > > > > So.......
> > > > > > [    0.815843] pci 0000:04:00.0: L1 latency exceeded - path: 1000 - max: 64000
> > > > > > [    0.815843] pci 0000:00:01.2: Upstream device - 32000
> > > > > > [    0.815844] pci 0000:01:00.0: Downstream device - 32000
> > > > >
> > > > > Wait a minute.  I've been looking at *03:00.0*, not 04:00.0.  Based
> > > > > on your bugzilla, here's the path:
> > > >
> > > > Correct, or you could do it like this:
> > > > 00:01.2/01:00.0/02:03.0/03:00.0 Ethernet controller: Intel Corporation
> > > > I211 Gigabit Network Connection (rev 03)
> > > >
> > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > >   02:03.0 Switch Downstream Port to [bus 03]
> > > > >   03:00.0 Endpoint (Intel I211 NIC)
> > > > >
> > > > > Your system does also have an 04:00.0 here:
> > > > >
> > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > >   02:04.0 Switch Downstream Port to [bus 04]
> > > > >   04:00.0 Endpoint (Realtek 816e)
> > > > >   04:00.1 Endpoint (Realtek RTL8111/8168/8411 NIC)
> > > > >   04:00.2 Endpoint (Realtek 816a)
> > > > >   04:00.4 Endpoint (Realtek 816d EHCI USB)
> > > > >   04:00.7 Endpoint (Realtek 816c IPMI)
> > > > >
> > > > > Which NIC is the problem?
> > > >
> > > > The intel one - so the side effect of the realtek nic is that it fixes
> > > > the intel nics issues.
> > > >
> > > > It would be that the intel nic actually has a bug with L1 (and it
> > > > would seem that it's to kind with latencies) so it actually has a
> > > > smaller buffer...
> > > >
> > > > And afair, the realtek has a larger buffer, since it behaves better
> > > > with L1 enabled.
> > > >
> > > > Either way, it's a fix that's needed ;)
> > >
> > > OK, what specifically is the fix that's needed?  The L0s change seems
> > > like a "this looks wrong" thing that doesn't actually affect your
> > > situation, so let's skip that for now.
> >
> > L1 latency calculation is not good enough, it assumes that *any*
> > link is the highest latency link - which is incorrect.
> >
> > The latency to bring L1 up is number-of-hops*1000 +
> > maximum-latency-along-the-path
> >
> > currently it's only doing number-of-hops*1000 +
> > arbitrary-latency-of-current-link
> >
> > > And let's support the change you *do* need with the "lspci -vv" for
> > > all the relevant devices (including both 03:00.0 and 04:00.x, I guess,
> > > since they share the 00:01.2 - 01:00.0 link), before and after the
> > > change.
> >
> > They are all included in all lspci output in the bug
>
> No doubt.  But I spent a long time going through those and the
> differences I found are not enough to show a spec violation or a fix.
>
> Here's what I extracted (this is a repeat; I mentioned this before):
>
>                                                     LnkCtl    LnkCtl
>            ------DevCap-------  ----LnkCap-------  -Before-  -After--
>   00:01.2                                L1 <32us       L1+       L1-
>   01:00.0                                L1 <32us       L1+       L1-
>   02:03.0                                L1 <32us       L1+       L1+
>   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
>
> I don't see anything wrong here yet.  03:00.0 claims it can handle up
> to 64us of L1 exit latency, and the L1 exit latency of the entire path
> should be 33us.  What am I missing?

it should be 32+3 so 35 us - but the intel nic claims something it
can't live up to.

Since this is triggered by the realtek device
                                                     LnkCtl    LnkCtl
            ------DevCap-------  ----LnkCap-------  -Before-  -After--
   00:01.2                                L1 <32us       L1+       L1-
   01:00.0                                L1 <32us       L1+       L1-
   02:04.0                                L1 <32us       L1+       L1+
   04:00.0  L0s <512 L1 <64us

But exit for 04:00.0 is 64us which means it breaks its own latency
requirements once it's behind anything

> > > I want to identify something in the "before" configuration that is
> > > wrong according to the spec, and a change in the "after" config so it
> > > now conforms to the spec.
> >
> > So there are a few issues here, the current code does not apply to spec.
> >
> > The intel nic gets fixed as a side effect (it should still get a
> > proper fix) of making
> > the code apply to spec.
> >
> > Basically, while hunting for the issue, I found that the L1 and L0s
> > latency calculations used to determine
> > if they should be enabled or not is wrong - that is what I'm currently
> > trying to push - it also seems like the
> > intel nic claims that it can handle 64us but apparently it can't.
> >
> > So, three bugs, two are "fixed" one needs additional fixing.
>
> OK, we just need to split these up as much as possible and support
> them with the relevant lspci output, analysis of what specifically is
> wrong, and the lspci output showing the effect of the fix.

Could i have a copy of the pcie spec? I have found sections of it but
it's really hard to find them again when you
want to refer to something... Which I need to do to show that it
doesn't conform...

Ie I can't show all the errors on my system :)

> Bjorn



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux