On Thu, Aug 5, 2021 at 10:37 AM Manish Raturi <raturi.manish@xxxxxxxxx> wrote: > > Hi All, > > I am facing an issue where CPU PCIE root port has x16 lanes and is > bifurcated in x4,x4,x8 , On x8 bifurcated port we have PEX switch at > downstream port connected in x4 mode. Also I have enabled the hotplug > on CPU PCIE root port, so that PEX switch can be taken out of reset in > the kernel and link training happens in the kernel. I am observing the > below behaviour: > > 1) In the kernel whenever we enumerate this PEX switch the link > never comes in GEN 4 it sometimes comes in GEN3 and sometimes in GEN2 > as well. > > 2) If I disable the link between CPU and PEX and retrain the link then > also the link comes in Gen3 or Gen2. > > 3) One experiment where I see GEN 4 coming is when the PEX switch is > out of reset in kernel and we do a reboot and as the switch is out of > reset , the BIOS enumerates it and we are able to see a link coming up > in GEN 4 in the BIOS. > > 4) Whenever we enumerate the PEX switch in the kernel we don't see a > link coming up in GEN4 in the kernel. > > 5) Kernel version we are using is 5.2. > > Queries: > > 1) What are the parameters which can be checked for GEN4 links not coming up. > 2) Does ASPM play any role in bringing the link down to lesser speed ? > 3) Please suggest what else I can check in software ? > > Logs when the PEX switch comes up fine in GEN4 in BIOS: > ============================================== > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ > Stepping- SERR+ FastB2B- DisINTx+ > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > > Latency: 0, Cache Line Size: 64 bytes > > Interrupt: pin A routed to IRQ 36 > > NUMA node: 0 > > Region 0: Memory at 21fc0000000 (64-bit, non-prefetchable) [size=128K] > > Bus: primary=14, secondary=30, subordinate=c5, sec-latency=0 > > I/O behind bridge: 0000f000-00000fff [empty] > > Memory behind bridge: c0000000-cfffffff [size=256M] > > Prefetchable memory behind bridge: 0000021f80000000-0000021fbfffffff [size=1G] > > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort+ <SERR- <PERR- > > BridgeCtl: Parity+ SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B- > > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > > Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 > > DevCap: MaxPayload 512 bytes, PhantFunc 0 > > ExtTag+ RBE+ > > DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ > > RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop- > > MaxPayload 512 bytes, MaxReadReq 4096 bytes > > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- > > LnkCap: Port #3, Speed 16GT/s, Width x8, ASPM L1, Exit Latency L1 <16us > > ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+ > > LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk- > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > > LnkSta: Speed 16GT/s (ok), Width x4 (downgraded) > > TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- > > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ > > Slot #83, PowerLimit 75.000W; Interlock- NoCompl- > > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt- HPIrq+ LinkChg+ > > Control: AttnInd Off, PwrInd Off, Power- Interlock- > > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- > > Changed: MRL- PresDet- LinkState- > > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+ > > RootCap: CRSVisible+ > > RootSta: PME ReqID 0000, PMEStatus- PMEPending- > > DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR-, OBFF Not > Supported ARIFwd+ > > AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS+ > > DevCtl2: Completion Timeout: 260ms to 900ms, TimeoutDis-, LTR-, OBFF > Disabled ARIFwd- > > AtomicOpsCtl: ReqEn+ EgressBlck- > > LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis- > > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > > Compliance De-emphasis: -6dB > > LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, > EqualizationPhase1+ > > EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- > > Capabilities: [80] Power Management version 3 > > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > > Capabilities: [88] Subsystem: Intel Corporation Device 347c > > Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- > > Address: fee00058 Data: 0000 > > Capabilities: [100 v1] Advanced Error Reporting > > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > MalfTLP- ECRC- UnsupReq- ACSViol- > > UEMsk: DLP- SDES+ TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > MalfTLP- ECRC- UnsupReq+ ACSViol- > > UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ > MalfTLP+ ECRC- UnsupReq- ACSViol- > > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- > > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- > > AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- > > MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap+ > > HeaderLog: 4a000001 33000004 fd000000 00000000 > > RootCmd: CERptEn+ NFERptEn+ FERptEn+ > > RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd- > > FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0 > > ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000 > > Capabilities: [148 v1] Access Control Services > > ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ > EgressCtrl- DirectTrans- > > ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- > EgressCtrl- DirectTrans- > > Capabilities: [180 v1] Vendor Specific Information: ID=0003 Rev=0 Len=00a <?> > > Capabilities: [190 v1] Downstream Port Containment > > DpcCap: INT Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+ > > DpcCtl: Trigger:1 Cmpl- INT+ ErrCor- PoisonedTLP- SwTrigger- DL_ActiveErr- > > DpcSta: Trigger- Reason:00 INT- RPBusy- TriggerExt:00 RP PIO ErrPtr:1f > > Source: 0000 > > Capabilities: [1e0 v2] Precision Time Measurement > > PTMCap: Requester:- Responder:+ Root:+ > > PTMClockGranularity: 2ns > > PTMControl: Enabled:+ RootSelected:+ > > PTMEffectiveGranularity: 2ns > > Capabilities: [200 v1] Secondary PCI Express <?> > > Capabilities: [400 v1] Data Link Feature <?> > > Capabilities: [410 v1] Physical Layer 16.0 GT/s <?> > > Capabilities: [450 v1] Lane Margining at the Receiver <?> > > Kernel driver in use: pcieport > > Failing Logs > ========== > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ > Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 36 > NUMA node: 0 > Region 0: Memory at 21fc0000000 (64-bit, non-prefetchable) [size=128K] > Bus: primary=14, secondary=30, subordinate=c5, sec-latency=0 > I/O behind bridge: 0000f000-00000fff [empty] > Memory behind bridge: fff00000-000fffff [empty] > Prefetchable memory behind bridge: 0000021f80000000-0000021fbfffffff [size=1G] > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort+ <SERR- <PERR- > BridgeCtl: Parity+ SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 > DevCap: MaxPayload 512 bytes, PhantFunc 0 > ExtTag+ RBE+ > DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ > RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop- > MaxPayload 128 bytes, MaxReadReq 4096 bytes > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- > LnkCap: Port #3, Speed 16GT/s, Width x8, ASPM L1, Exit Latency L1 <64us > ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+ > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- > ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 8GT/s (downgraded), Width x4 (downgraded) > TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+ > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ > Slot #83, PowerLimit 75.000W; Interlock- NoCompl- > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt- HPIrq+ LinkChg+ > Control: AttnInd Off, PwrInd Off, Power- Interlock- > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- > Changed: MRL- PresDet- LinkState- > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+ > RootCap: CRSVisible+ > RootSta: PME ReqID 0000, PMEStatus- PMEPending- > DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR-, OBFF Not > Supported ARIFwd+ > AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS+ > DevCtl2: Completion Timeout: 260ms to 900ms, TimeoutDis-, LTR-, OBFF > Disabled ARIFwd- > AtomicOpsCtl: ReqEn+ EgressBlck- > LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, > EqualizationPhase1+ > EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest+ > Capabilities: [80] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [88] Subsystem: Intel Corporation Device 347c > Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- > Address: fee00058 Data: 0000 > Capabilities: [100 v1] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES+ TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > MalfTLP- ECRC- UnsupReq+ ACSViol- > UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ > MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- > AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- > MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap+ > HeaderLog: 4a000001 33000004 fd000000 00000000 > RootCmd: CERptEn+ NFERptEn+ FERptEn+ > RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd- > FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0 > ErrorSrc: ERR_COR: 1420 ERR_FATAL/NONFATAL: 0000 > Capabilities: [148 v1] Access Control Services > ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ > EgressCtrl- DirectTrans- > ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- > EgressCtrl- DirectTrans- > Capabilities: [180 v1] Vendor Specific Information: ID=0003 Rev=0 Len=00a <?> > Capabilities: [190 v1] Downstream Port Containment > DpcCap: INT Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+ > DpcCtl: Trigger:1 Cmpl- INT+ ErrCor- PoisonedTLP- SwTrigger- DL_ActiveErr- > DpcSta: Trigger- Reason:00 INT- RPBusy- TriggerExt:00 RP PIO ErrPtr:1f > Source: 0000 > Capabilities: [1e0 v2] Precision Time Measurement > PTMCap: Requester:- Responder:+ Root:+ > PTMClockGranularity: 2ns > PTMControl: Enabled:+ RootSelected:+ > PTMEffectiveGranularity: 2ns > Capabilities: [200 v1] Secondary PCI Express <?> > Capabilities: [400 v1] Data Link Feature <?> > Capabilities: [410 v1] Physical Layer 16.0 GT/s <?> > Capabilities: [450 v1] Lane Margining at the Receiver <?> > Kernel driver in use: pcieport > > Thanks & Regards > Manish Raturi Any suggestions ?