On Wed, Oct 14, 2020 at 10:34 AM Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx> wrote: > > > > > On Oct 12, 2020, at 18:20, Ian Kumlien <ian.kumlien@xxxxxxxxx> wrote: > > > > On Thu, Oct 8, 2020 at 6:13 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > >> > >> On Wed, Oct 07, 2020 at 03:28:08PM +0200, Ian Kumlien wrote: > >>> Make pcie_aspm_check_latency comply with the PCIe spec, specifically: > >>> "5.4.1.2.2. Exit from the L1 State" > >>> > >>> Which makes it clear that each switch is required to initiate a > >>> transition within 1μs from receiving it, accumulating this latency and > >>> then we have to wait for the slowest link along the path before > >>> entering L0 state from L1. > >>> > >>> The current code doesn't take the maximum latency into account. > >>> > >>> From the example: > >>> +----------------+ > >>> | | > >>> | Root complex | > >>> | | > >>> | +-----+ | > >>> | |32 μs| | > >>> +----------------+ > >>> | > >>> | Link 1 > >>> | > >>> +----------------+ > >>> | |8 μs| | > >>> | +----+ | > >>> | Switch A | > >>> | +----+ | > >>> | |8 μs| | > >>> +----------------+ > >>> | > >>> | Link 2 > >>> | > >>> +----------------+ > >>> | |32 μs| | > >>> | +-----+ | > >>> | Switch B | > >>> | +-----+ | > >>> | |32 μs| | > >>> +----------------+ > >>> | > >>> | Link 3 > >>> | > >>> +----------------+ > >>> | |8μs| | > >>> | +---+ | > >>> | Endpoint C | > >>> | | > >>> | | > >>> +----------------+ > >>> > >>> Links 1, 2 and 3 are all in L1 state - endpoint C initiates the > >>> transition to L0 at time T. Since switch B takes 32 μs to exit L1 on > >>> it's ports, Link 3 will transition to L0 at T+32 (longest time > >>> considering T+8 for endpoint C and T+32 for switch B). > >>> > >>> Switch B is required to initiate a transition from the L1 state on it's > >>> upstream port after no more than 1 μs from the beginning of the > >>> transition from L1 state on the downstream port. Therefore, transition from > >>> L1 to L0 will begin on link 2 at T+1, this will cascade up the path. > >>> > >>> The path will exit L1 at T+34. > >>> > >>> On my specific system: > >>> lspci -PP -s 04:00.0 > >>> 00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 816e (rev 1a) > >>> > >>> lspci -vvv -s 04:00.0 > >>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us > >>> ... > >>> LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us > >>> ... > >>> > >>> Which means that it can't be followed by any switch that is in L1 state. > >>> > >>> This patch fixes it by disabling L1 on 02:04.0, 01:00.0 and 00:01.2. > >>> > >>> LnkCtl LnkCtl > >>> ------DevCap------- ----LnkCap------- -Before- -After-- > >>> 00:01.2 L1 <32us L1+ L1- > >>> 01:00.0 L1 <32us L1+ L1- > >>> 02:04.0 L1 <32us L1+ L1- > >>> 04:00.0 L0s <512 L1 <64us L1 <64us L1+ L1- > >> > >> OK, now we're getting close. We just need to flesh out the > >> justification. We need: > >> > >> - Tidy subject line. Use "git log --oneline drivers/pci/pcie/aspm.c" > >> and follow the example. > > > > Will do > > > >> - Description of the problem. I think it's poor bandwidth on your > >> Intel I211 device, but we don't have the complete picture because > >> that NIC is 03:00.0, which doesn't appear above at all. > > > > I think we'll use Kai-Hengs issue, since it's actually more related to > > the change itself... > > > > Mine is a side effect while Kai-Heng is actually hitting an issue > > caused by the bug. > > I filed a bug here: > https://bugzilla.kernel.org/show_bug.cgi?id=209671 Thanks! I'm actually starting to think that reporting what we do with the latency bit could be beneficial - i.e. report which links have their L1 disabled due to which device... I also think that this could benefit debugging - I have no clue of how to read the lspci:s - I mean i do see some differences that might be the fix but nothing really specific without a proper message in dmesg.... Björn, what do you think? > Kai-Heng > > > > >> - Explanation of what's wrong with the "before" ASPM configuration. > >> I want to identify what is wrong on your system. The generic > >> "doesn't match spec" part is good, but step 1 is the specific > >> details, step 2 is the generalization to relate it to the spec. > >> > >> - Complete "sudo lspci -vv" information for before and after the > >> patch below. https://bugzilla.kernel.org/show_bug.cgi?id=208741 > >> has some of this, but some of the lspci output appears to be > >> copy/pasted and lost all its formatting, and it's not clear how > >> some was collected (what kernel version, with/without patch, etc). > >> Since I'm asking for bugzilla attachments, there's no space > >> constraint, so just attach the complete unedited output for the > >> whole system. > >> > >> - URL to the bugzilla. Please open a new one with just the relevant > >> problem report ("NIC is slow") and attach (1) "before" lspci > >> output, (2) proposed patch, (3) "after" lspci output. The > >> existing 208741 report is full of distractions and jumps to the > >> conclusion without actually starting with the details of the > >> problem. > >> > >> Some of this I would normally just do myself, but I can't get the > >> lspci info. It would be really nice if Kai-Heng could also add > >> before/after lspci output from the system he tested. > >> > >>> Signed-off-by: Ian Kumlien <ian.kumlien@xxxxxxxxx> > >>> --- > >>> drivers/pci/pcie/aspm.c | 23 +++++++++++++++-------- > >>> 1 file changed, 15 insertions(+), 8 deletions(-) > >>> > >>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > >>> index 253c30cc1967..893b37669087 100644 > >>> --- a/drivers/pci/pcie/aspm.c > >>> +++ b/drivers/pci/pcie/aspm.c > >>> @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev, > >>> > >>> static void pcie_aspm_check_latency(struct pci_dev *endpoint) > >>> { > >>> - u32 latency, l1_switch_latency = 0; > >>> + u32 latency, l1_max_latency = 0, l1_switch_latency = 0; > >>> struct aspm_latency *acceptable; > >>> struct pcie_link_state *link; > >>> > >>> @@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint) > >>> if ((link->aspm_capable & ASPM_STATE_L0S_DW) && > >>> (link->latency_dw.l0s > acceptable->l0s)) > >>> link->aspm_capable &= ~ASPM_STATE_L0S_DW; > >>> + > >>> /* > >>> * Check L1 latency. > >>> - * Every switch on the path to root complex need 1 > >>> - * more microsecond for L1. Spec doesn't mention L0s. > >>> + * > >>> + * PCIe r5.0, sec 5.4.1.2.2 states: > >>> + * A Switch is required to initiate an L1 exit transition on its > >>> + * Upstream Port Link after no more than 1 μs from the beginning of an > >>> + * L1 exit transition on any of its Downstream Port Links. > >>> * > >>> * The exit latencies for L1 substates are not advertised > >>> * by a device. Since the spec also doesn't mention a way > >>> @@ -469,11 +473,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint) > >>> * L1 exit latencies advertised by a device include L1 > >>> * substate latencies (and hence do not do any check). > >>> */ > >>> - latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1); > >>> - if ((link->aspm_capable & ASPM_STATE_L1) && > >>> - (latency + l1_switch_latency > acceptable->l1)) > >>> - link->aspm_capable &= ~ASPM_STATE_L1; > >>> - l1_switch_latency += 1000; > >>> + if (link->aspm_capable & ASPM_STATE_L1) { > >>> + latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1); > >>> + l1_max_latency = max_t(u32, latency, l1_max_latency); > >>> + if (l1_max_latency + l1_switch_latency > acceptable->l1) > >>> + link->aspm_capable &= ~ASPM_STATE_L1; > >>> + > >>> + l1_switch_latency += 1000; > >>> + } > >>> > >>> link = link->parent; > >>> } > >>> -- > >>> 2.28.0 > >>> >