Hello, I am debugging a problem where the AER driver reports Completion Timeouts" for any PCI memory read access to a certain endpoint device: pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error received: id=0018 pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0018(Requester ID) pcieport 0000:00:03.0: device [8086:1f12] error status/mask=00004000/00400000 pcieport 0000:00:03.0: [14] Completion Timeout (First) pcieport 0000:00:03.0: AER: Device recovery failed pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error received: id=0018 pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0018(Requester ID) pcieport 0000:00:03.0: device [8086:1f12] error status/mask=00004000/00400000 pcieport 0000:00:03.0: [14] Completion Timeout (First) pcieport 0000:00:03.0: AER: Device recovery failed The configuration access to the said device is going on fine. The same end point and the same hardware is working fine on a different OS. This is what my PCIe hierarchy looks like: root@localhost:~# lspci -t -[0000:00]-+-00.0 +-01.0-[01-06]-- +-02.0-[07-0c]-- +-03.0-[0d-12]----00.0 <--- The device I am trying to access +-04.0-[13-18]-- +-0e.0 +-0f.0 +-13.0 +-14.0 +-14.1 +-14.2 +-16.0 +-17.0 +-18.0 +-1f.0 \-1f.3 root@localhost:~# The device i am trying to access is 0d:00.0 which is a FPGA device connected directly to an Intel root port: root@localhost:~# lspci -vvvv -s d:0 0d:00.0 System peripheral: Juniper Networks Device 006c (rev 02) Subsystem: Juniper Networks Device 006c Physical Slot: 2 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 22 Region 0: Memory at 8c000000 (32-bit, non-prefetchable) [size=64M] Capabilities: [50] MSI: Enable- Count=1/4 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [78] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [80] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #1, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 unlimited, L1 unlimited ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Kernel driver in use: sam-mfd-core This is how the parent root port is setup: root@localhost:~# root@localhost:~# lspci -vvvv -s 0:3.0 00:03.0 PCI bridge: Intel Corporation Device 1f12 (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Region 0: Memory at a4040000 (64-bit, non-prefetchable) [size=128K] Bus: primary=00, secondary=0d, subordinate=12, sec-latency=0 I/O behind bridge: 00002000-00002fff Memory behind bridge: 8c000000-93ffffff Prefetchable memory behind bridge: 0000000082000000-0000000082ffffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ RBE+ FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 256 bytes, MaxReadReq 256 bytes DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #3, Speed 5GT/s, Width x4, ASPM L1, Latency L0 <1us, L1 <4us ClockPM- Surprise+ LLActRep+ BwNot+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Slot #2, PowerLimit 25.000W; Interlock- NoCompl- SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt+ HPIrq+ LinkChg+ Control: AttnInd Off, PwrInd Off, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet- LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [80] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [88] Subsystem: Intel Corporation Device 7270 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable+ 64bit- Address: fee0300c Data: 4191 Masking: 00000000 Pending: 00000000 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 0e, GenCap- CGenEn- ChkCap- ChkEn- Kernel driver in use: pcieport root@localhost:~# root@localhost:~# The driver for the end point does little other than the following at this point, which is when the AER is received: err = pci_enable_device(pdev); if (err < 0) { dev_err(dev, "pci_enable_device() failed: %d\n", err); return err; } err = pci_request_regions(pdev, "sam-mfd-code"); if (err < 0) { dev_err(dev, "pci_request_regions() failed: %d\n", err); goto err_disable; } sam->membase = pci_ioremap_bar(pdev, 0); if (!sam->membase) { dev_err(dev, "pci_ioremap_bar() failed\n"); err = -ENOMEM; goto err_release; } cpld = (u8 *)sam->membase; for (i = 0; i < 0x40; i++) { pr_emerg(" %02X", cpld[i]); } My questions: 1) What can be the possible reasons for such a "completion timeout"? I'd have suspected the end device to not respond, but apparently it is working fine on a different OS. This looks like a configuration issue somewhere. 2) Can some one please take a look at the lspci output above and confirm that the configuration is OK and doesn't have an obvious reason for timeout? The device expects a 64MB non-prefetchable memory window. 3) The only anomaly I had found was this message while booting up: pci 0000:0d:00.0: Max Payload Size 128, but upstream 0000:00:03.0 set to 256; if necessary, use "pci=pcie_bus_safe" and report a bug But after using "pci=pcie_bus_safe", that also went away. However, I'm not sure why currently the Maximum read size is still set to 512 in the device. Do you think that could be the issue? I'll be grateful for any other pointers that you throw at me... Thanks, Rajat -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html