Re: Debugging "Completion Timeout" errors as reported by AER

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2014-04-08 at 20:48 -0700, Rajat Jain wrote:
> Hello,
> 
> I am debugging a problem where the AER driver reports Completion
> Timeouts" for any PCI memory read access to a certain endpoint device:
> 
> pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error received: id=0018
> pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected
> (Non-Fatal), type=Transaction Layer, id=0018(Requester ID)
> pcieport 0000:00:03.0:   device [8086:1f12] error status/mask=00004000/00400000
> pcieport 0000:00:03.0:    [14] Completion Timeout     (First)
> pcieport 0000:00:03.0: AER: Device recovery failed
> pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error received: id=0018
> pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected
> (Non-Fatal), type=Transaction Layer, id=0018(Requester ID)
> pcieport 0000:00:03.0:   device [8086:1f12] error status/mask=00004000/00400000
> pcieport 0000:00:03.0:    [14] Completion Timeout     (First)
> pcieport 0000:00:03.0: AER: Device recovery failed
> 
> The configuration access to the said device is going on fine. The same
> end point and the same hardware is working fine on a different OS.
> 
> This is what my PCIe hierarchy looks like:
> 
> root@localhost:~# lspci -t
> -[0000:00]-+-00.0
>            +-01.0-[01-06]--
>            +-02.0-[07-0c]--
>            +-03.0-[0d-12]----00.0 <--- The device I am trying to access
>            +-04.0-[13-18]--
>            +-0e.0
>            +-0f.0
>            +-13.0
>            +-14.0
>            +-14.1
>            +-14.2
>            +-16.0
>            +-17.0
>            +-18.0
>            +-1f.0
>            \-1f.3
> root@localhost:~#
> 
> The device i am trying to access is 0d:00.0 which is a FPGA device
> connected directly to an Intel root port:
> 
> root@localhost:~# lspci -vvvv -s d:0
> 0d:00.0 System peripheral: Juniper Networks Device 006c (rev 02)
>         Subsystem: Juniper Networks Device 006c
>         Physical Slot: 2
>         Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 22
>         Region 0: Memory at 8c000000 (32-bit, non-prefetchable) [size=64M]
>         Capabilities: [50] MSI: Enable- Count=1/4 Maskable- 64bit+
>                 Address: 0000000000000000  Data: 0000
>         Capabilities: [78] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [80] Express (v1) Endpoint, MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
> <64ns, L1 <1us
>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
> Unsupported-
>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>                         MaxPayload 256 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
> AuxPwr- TransPend-
>                 LnkCap: Port #1, Speed 2.5GT/s, Width x4, ASPM L0s,
> Latency L0 unlimited, L1 unlimited
>                         ClockPM- Surprise- LLActRep- BwNot-
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train-
> SlotClk+ DLActive- BWMgmt- ABWMgmt-
>         Capabilities: [100 v1] Virtual Channel
>                 Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
>                 Arb:    Fixed- WRR32- WRR64- WRR128-
>                 Ctrl:   ArbSelect=Fixed
>                 Status: InProgress-
>                 VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>                         Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>                         Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>                         Status: NegoPending- InProgress-
>         Kernel driver in use: sam-mfd-core
> 
> This is how the parent root port is setup:
> 
> root@localhost:~#
> root@localhost:~# lspci -vvvv -s 0:3.0
> 00:03.0 PCI bridge: Intel Corporation Device 1f12 (rev 02) (prog-if 00
> [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Region 0: Memory at a4040000 (64-bit, non-prefetchable) [size=128K]
>         Bus: primary=00, secondary=0d, subordinate=12, sec-latency=0
>         I/O behind bridge: 00002000-00002fff
>         Memory behind bridge: 8c000000-93ffffff
>         Prefetchable memory behind bridge: 0000000082000000-0000000082ffffff
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort+ <SERR- <PERR-
>         BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>         Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
> <64ns, L1 <1us
>                         ExtTag+ RBE+ FLReset-
>                 DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
> Unsupported+
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                         MaxPayload 256 bytes, MaxReadReq 256 bytes
>                 DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq-
> AuxPwr- TransPend-
>                 LnkCap: Port #3, Speed 5GT/s, Width x4, ASPM L1,
> Latency L0 <1us, L1 <4us
>                         ClockPM- Surprise+ LLActRep+ BwNot+
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train-
> SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
>                 SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> HotPlug+ Surprise+
>                         Slot #2, PowerLimit 25.000W; Interlock- NoCompl-
>                 SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+
> CmdCplt+ HPIrq+ LinkChg+
>                         Control: AttnInd Off, PwrInd Off, Power- Interlock-
>                 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> PresDet+ Interlock-
>                         Changed: MRL- PresDet- LinkState-
>                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> PMEIntEna- CRSVisible-
>                 RootCap: CRSVisible-
>                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>                 DevCap2: Completion Timeout: Range ABC, TimeoutDis+,
> LTR-, OBFF Not Supported ARIFwd+
>                 DevCtl2: Completion Timeout: 50us to 50ms,
> TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
>                 LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range,
> EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -6dB,
> EqualizationComplete-, EqualizationPhase1-
>                          EqualizationPhase2-, EqualizationPhase3-,
> LinkEqualizationRequest-
>         Capabilities: [80] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [88] Subsystem: Intel Corporation Device 7270
>         Capabilities: [90] MSI: Enable+ Count=1/1 Maskable+ 64bit-
>                 Address: fee0300c  Data: 4191
>                 Masking: 00000000  Pending: 00000000
>         Capabilities: [100 v1] Advanced Error Reporting
>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt-
> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>                 AERCap: First Error Pointer: 0e, GenCap- CGenEn- ChkCap- ChkEn-
>         Kernel driver in use: pcieport
> 
> root@localhost:~#
> root@localhost:~#
> 
> The driver for the end point does little other than the following at
> this point, which is when the AER is received:
> 
>        err = pci_enable_device(pdev);
>         if (err < 0) {
>                 dev_err(dev, "pci_enable_device() failed: %d\n", err);
>                 return err;
>         }
> 
>         err = pci_request_regions(pdev, "sam-mfd-code");
>         if (err < 0) {
>                 dev_err(dev, "pci_request_regions() failed: %d\n",
>                         err);
>                 goto err_disable;
>         }
> 
>         sam->membase = pci_ioremap_bar(pdev, 0);
>         if (!sam->membase) {
>                 dev_err(dev, "pci_ioremap_bar() failed\n");
>                 err = -ENOMEM;
>                 goto err_release;
>         }
> 
>         cpld = (u8 *)sam->membase;
>         for (i = 0; i < 0x40; i++) {
>                         pr_emerg(" %02X", cpld[i]);
>         }
> 
> 
> My questions:

Maybe a silly question, but does the device support byte-width access to
this space?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux