Re: hang on enabling PCI AER for IMX8MP + PI7C9X3G606GP 6-port Gen3 switch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 19, 2024 at 1:02 PM Frank Li <Frank.li@xxxxxxx> wrote:
>
> On Thu, Dec 19, 2024 at 09:54:55AM -0800, Tim Harvey wrote:
> > Greetings,
> >
> > I have a board with an NXP IMX8MP SoC Gen3 PCI host controller
> > connected to a Diodes Inc PI7C9X3G606GP (imx8mp-venice-gw82xx-2x.dts)
> > which hangs during pci enumeration if PCIEAER is enabled.
>
> How to reproduce it? Just enable CONFIG_PCIEAER?
>

Hi Frank,

Correct, enabling CONFIG_PCIEAER produces this hang for me.

Disabling CONFIG_PCIEAER or via cmdline 'pci=noaer' or by the
following hack resolves it:
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 13b8586924ea..0ba05120cc2a 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -239,7 +239,11 @@ static int pci_enable_pcie_error_reporting(struct
pci_dev *dev)
        if (!pcie_aer_is_native(dev))
                return -EIO;

-       rc = pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
+       /* do not enable fatal error reporting for PI7C9X3G606GP
upstream port */
+       if (dev->devfn == PCI_DEVFN(0, 0) && dev->vendor == 0x12d8)
+               rc = pcie_capability_set_word(dev, PCI_EXP_DEVCTL,
PCI_EXP_AER_FLAGS & ~PCI_EXP_DEVCTL_FERE);
+       else
+               rc = pcie_capability_set_word(dev, PCI_EXP_DEVCTL,
PCI_EXP_AER_FLAGS);
        return pcibios_err_to_errno(rc);
 }

I don't expect this to be an issue with the IMX8MP as much as the
PI7C9X3G606GP switch as we successfully use several other PCIe
switches with the IMX8MP. I provided the lspci results here in case
something was evident from that, like maybe the switch doesn't support
fatal error reporting and it shouldn't be enabled or something (I'm
not sure how to interpret the verbose results of the AER caps).

I've re-created this on all kernels I've tested between 6.6 and 6.12.

Best Regards,

Tim

> Frank
>
> >
> > I've tracked this down to the enabling of fatal error reporting
> > (PCI_EXP_DEVCTL_FERE) on the upstream port of the PI7C9X3G606GP. In
> > other words if I mask that bit out of the
> > pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS) call
> > for that device (or disable PCI AER via CONFIG_PCIEAER or pci=noaer)
> > all is well.
> >
> > Here is what lspci shows for the root complex and the switch upstream port:
> > # lspci -n
> > 00:00.0 0604: 16c3:abcd (rev 01)
> > 01:00.0 0604: 12d8:c008 (rev 07)
> > 02:01.0 0604: 12d8:c008 (rev 06)
> > 02:02.0 0604: 12d8:c008 (rev 06)
> > 02:03.0 0604: 12d8:c008 (rev 06)
> > 02:04.0 0604: 12d8:c008 (rev 06)
> > 02:05.0 0604: 12d8:c008 (rev 06)
> > 02:06.0 0604: 12d8:c008 (rev 06)
> > 02:07.0 0604: 12d8:c008 (rev 06)
> > 09:00.0 0200: 1055:7430 (rev 11)
> > # lspci -s 00:00.0 -vvv
> > 00:00.0 PCI bridge: Synopsys, Inc. DWC_usb3 / PCIe bridge (rev 01)
> > (prog-if 00 [Normal decode])
> >         Device tree node:
> > /sys/firmware/devicetree/base/soc@0/pcie@33800000/pcie@0,0
> >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR+ FastB2B- DisINTx+
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Latency: 0
> >         Interrupt: pin A routed to IRQ 219
> >         Region 0: Memory at 18000000 (32-bit, non-prefetchable) [size=1M]
> >         Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
> >         I/O behind bridge: f000-0fff [disabled] [16-bit]
> >         Memory behind bridge: 18100000-182fffff [size=2M] [32-bit]
> >         Prefetchable memory behind bridge: fff00000-000fffff [disabled] [32-bit]
> >         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort- <SERR- <PERR-
> >         Expansion ROM at 18300000 [virtual] [disabled] [size=64K]
> >         BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
> >                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> >         Capabilities: [40] Power Management version 3
> >                 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA
> > PME(D0+,D1+,D2-,D3hot+,D3cold+)
> >                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> >         Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
> >                 Address: 0000000040101000  Data: 0000
> >                 Masking: 00000000  Pending: 00000000
> >         Capabilities: [70] Express (v2) Root Port (Slot-), IntMsgNum 0
> >                 DevCap: MaxPayload 128 bytes, PhantFunc 0
> >                         ExtTag- RBE+ TEE-IO-
> >                 DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> >                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >                         MaxPayload 128 bytes, MaxReadReq 512 bytes
> >                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > AuxPwr+ TransPend-
> >                 LnkCap: Port #0, Speed 8GT/s, Width x1, ASPM L0s L1,
> > Exit Latency L0s <1us, L1 unlimited
> >                         ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
> >                 LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
> >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                 LnkSta: Speed 8GT/s, Width x1
> >                         TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt+
> >                 RootCap: CRSVisible+
> >                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > PMEIntEna+ CRSVisible+
> >                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> >                 DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
> > NROPrPrP+ LTR-
> >                          10BitTagComp- 10BitTagReq- OBFF Not
> > Supported, ExtFmt- EETLPPrefix-
> >                          EmergencyPowerReduction Not Supported,
> > EmergencyPowerReductionInit-
> >                          FRS- LN System CLS Not Supported, TPHComp-
> > ExtTPHComp- ARIFwd-
> >                          AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
> >                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-
> >                          AtomicOpsCtl: ReqEn- EgressBlck-
> >                          IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
> >                          10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
> >                 LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink-
> > Retimer- 2Retimers- DRS-
> >                 LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> >                          Transmit Margin: Normal Operating Range,
> > EnterModifiedCompliance- ComplianceSOS-
> >                          Compliance Preset/De-emphasis: -6dB
> > de-emphasis, 0dB preshoot
> >                 LnkSta2: Current De-emphasis Level: -6dB,
> > EqualizationComplete+ EqualizationPhase1+
> >                          EqualizationPhase2+ EqualizationPhase3+
> > LinkEqualizationRequest-
> >                          Retimer- 2Retimers- CrosslinkRes: unsupported
> >         Capabilities: [100 v2] Advanced Error Reporting
> >                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> > UnxCmplt- RxOF- MalfTLP-
> >                         ECRC- UnsupReq- ACSViol- UncorrIntErr-
> > BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
> >                         PoisonTLPBlocked- DMWrReqBlocked- IDECheck-
> > MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
> >                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> > UnxCmplt- RxOF- MalfTLP-
> >                         ECRC- UnsupReq- ACSViol- UncorrIntErr+
> > BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
> >                         PoisonTLPBlocked- DMWrReqBlocked- IDECheck-
> > MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
> >                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
> > UnxCmplt- RxOF+ MalfTLP+
> >                         ECRC- UnsupReq- ACSViol- UncorrIntErr+
> > BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
> >                         PoisonTLPBlocked- DMWrReqBlocked- IDECheck-
> > MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
> >                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> > AdvNonFatalErr- CorrIntErr- HeaderOF-
> >                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> > AdvNonFatalErr+ CorrIntErr+ HeaderOF+
> >                 AERCap: First Error Pointer: 00, ECRCGenCap+
> > ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> >                         MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >                 HeaderLog: 00000000 00000000 00000000 00000000
> >                 RootCmd: CERptEn+ NFERptEn+ FERptEn+
> >                 RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
> >                          FirstFatal- NonFatalMsg- FatalMsg- IntMsgNum 0
> >                 ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
> >         Capabilities: [148 v1] Secondary PCI Express
> >                 LnkCtl3: LnkEquIntrruptEn- PerformEqu-
> >                 LaneErrStat: 0
> >         Capabilities: [158 v1] L1 PM Substates
> >                 L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2-
> > ASPM_L1.1+ L1_PM_Substates+
> >                           PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
> >                 L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >                            T_CommonMode=10us
> >                 L1SubCtl2: T_PwrOn=10us
> >         Kernel driver in use: pcieport
> > # lspci -s 01:00.0 -vvv
> > 01:00.0 PCI bridge: Pericom Semiconductor Device c008 (rev 07)
> > (prog-if 00 [Normal decode])
> >         Subsystem: Pericom Semiconductor Device c008
> >         Device tree node:
> > /sys/firmware/devicetree/base/soc@0/pcie@33800000/pcie@0,0/pcie@0,0
> >         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR- FastB2B- DisINTx-
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Latency: 0
> >         Region 0: Memory at 18200000 (32-bit, non-prefetchable) [size=512K]
> >         Bus: primary=01, secondary=02, subordinate=09, sec-latency=0
> >         I/O behind bridge: 0000f000-00000fff [disabled] [32-bit]
> >         Memory behind bridge: 18100000-181fffff [size=1M] [32-bit]
> >         Prefetchable memory behind bridge:
> > 00000000fff00000-00000000000fffff [disabled] [64-bit]
> >         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort- <SERR- <PERR-
> >         BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
> >                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> >         Capabilities: [40] Power Management version 3
> >                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> >                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >         Capabilities: [48] MSI: Enable- Count=1/8 Maskable+ 64bit+
> >                 Address: 0000000000000000  Data: 0000
> >                 Masking: 00000000  Pending: 00000000
> >         Capabilities: [68] Express (v2) Upstream Port, IntMsgNum 0
> >                 DevCap: MaxPayload 512 bytes, PhantFunc 0
> >                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+
> > SlotPowerLimit 4W TEE-IO-
> >                 DevCtl: CorrErr+ NonFatalErr+ FatalErr- UnsupReq+
> >                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >                         MaxPayload 128 bytes, MaxReadReq 128 bytes
> >                 DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+
> > AuxPwr- TransPend-
> >                 LnkCap: Port #0, Speed 8GT/s, Width x1, ASPM L1, Exit
> > Latency L1 <1us
> >                         ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >                 LnkCtl: ASPM Disabled; LnkDisable- CommClk+
> >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                 LnkSta: Speed 8GT/s, Width x1
> >                         TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >                 DevCap2: Completion Timeout: Not Supported,
> > TimeoutDis- NROPrPrP- LTR-
> >                          10BitTagComp- 10BitTagReq- OBFF Not
> > Supported, ExtFmt- EETLPPrefix-
> >                          EmergencyPowerReduction Not Supported,
> > EmergencyPowerReductionInit-
> >                          FRS-
> >                          AtomicOpsCap: Routing+ 32bit- 64bit- 128bitCAS-
> >                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
> >                          AtomicOpsCtl: EgressBlck-
> >                          IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
> >                          10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
> >                 LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink-
> > Retimer- 2Retimers- DRS-
> >                 LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> >                          Transmit Margin: Normal Operating Range,
> > EnterModifiedCompliance- ComplianceSOS-
> >                          Compliance Preset/De-emphasis: -6dB
> > de-emphasis, 0dB preshoot
> >                 LnkSta2: Current De-emphasis Level: -6dB,
> > EqualizationComplete+ EqualizationPhase1+
> >                          EqualizationPhase2+ EqualizationPhase3+
> > LinkEqualizationRequest-
> >                          Retimer- 2Retimers- CrosslinkRes: unsupported
> >         Capabilities: [a4] Subsystem: Pericom Semiconductor Device c008
> >         Capabilities: [b0] MSI-X: Enable- Count=6 Masked-
> >                 Vector table: BAR=0 offset=0007f000
> >                 PBA: BAR=0 offset=0007f080
> >         Capabilities: [100 v1] Advanced Error Reporting
> >                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> > UnxCmplt- RxOF- MalfTLP-
> >                         ECRC- UnsupReq- ACSViol- UncorrIntErr-
> > BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
> >                         PoisonTLPBlocked- DMWrReqBlocked- IDECheck-
> > MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
> >                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> > UnxCmplt- RxOF- MalfTLP-
> >                         ECRC- UnsupReq- ACSViol- UncorrIntErr+
> > BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
> >                         PoisonTLPBlocked- DMWrReqBlocked- IDECheck-
> > MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
> >                 UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt-
> > UnxCmplt- RxOF+ MalfTLP+
> >                         ECRC- UnsupReq- ACSViol- UncorrIntErr+
> > BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
> >                         PoisonTLPBlocked- DMWrReqBlocked- IDECheck-
> > MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
> >                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> > AdvNonFatalErr+ CorrIntErr- HeaderOF-
> >                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
> > AdvNonFatalErr+ CorrIntErr+ HeaderOF-
> >                 AERCap: First Error Pointer: 00, ECRCGenCap+
> > ECRCGenEn- ECRCChkCap+ ECRCChkEn-
> >                         MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >                 HeaderLog: 00000000 00000000 00000000 00000000
> >         Capabilities: [130 v1] Virtual Channel
> >                 Caps:   LPEVC=0 RefClk=100ns PATEntryBits=4
> >                 Arb:    Fixed- WRR32- WRR64- WRR128-
> >                 Ctrl:   ArbSelect=Fixed
> >                 Status: InProgress-
> >                 VC0:    Caps:   PATOffset=05 MaxTimeSlots=64 RejSnoopTrans-
> >                         Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> >                         Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> >                         Status: NegoPending- InProgress-
> >                         Port Arbitration Table <?>
> >         Capabilities: [1a0 v1] Device Serial Number 08-16-48-96-00-00-12-d8
> >         Capabilities: [1b0 v1] Power Budgeting <?>
> >         Capabilities: [1d0 v1] Multicast
> >                 McastCap: MaxGroups 64, ECRCRegen-
> >                 McastCtl: NumGroups 1, Enable-
> >                 McastBAR: IndexPos 0, BaseAddr 0000000000000000
> >                 McastReceiveVec:      0000000000000000
> >                 McastBlockAllVec:     0000000000000000
> >                 McastBlockUntransVec: 0000000000000000
> >                 McastOverlayBAR: OverlaySize 0 (disabled), BaseAddr
> > 0000000000000000
> >         Capabilities: [210 v1] Secondary PCI Express
> >                 LnkCtl3: LnkEquIntrruptEn- PerformEqu-
> >                 LaneErrStat: 0
> >         Capabilities: [2b0 v1] L1 PM Substates
> >                 L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2-
> > ASPM_L1.1- L1_PM_Substates+
> >                 L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >                 L1SubCtl2:
> >         Capabilities: [300 v1] Vendor Specific Information: ID=0000
> > Rev=0 Len=560 <?>
> >         Kernel driver in use: pcieport
> >
> > Is there anything here or any ideas on what could be the issue here?
> >
> > Best Regards,
> >
> > Tim





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux