Re: [EXT] VPD access Blocked by commit 0d5370d1d85251e5893ab7c90a429464de2e140b

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bjorn, 

Thank you for responding to my question.

> On May 21, 2019, at 1:11 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
> 
> External Email
> 
> ----------------------------------------------------------------------
> [fix linux-pci, remove ethan.zhao (bounces)]
> 
> From: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Date: Tue, May 21, 2019 at 3:02 PM
> To: Himanshu Madhani
> Cc: ethan.zhao@xxxxxxxxxx, Andrew Vasquez, Girish Basrur, Giridhar
> Malavali, Myron Stowe, <linux-pci@xxxxxxxxxx>, Linux Kernel Mailing
> List, Quinn Tran
> 
>> [+cc Myron, Quinn, linux-pci, linux-kernel]
>> 
>> From: Himanshu Madhani <hmadhani@xxxxxxxxxxx>
>> Date: Fri, May 17, 2019 at 5:21 PM
>> To: ethan.zhao@xxxxxxxxxx, bhelgaas@xxxxxxxxxx
>> Cc: Andrew Vasquez, Girish Basrur, Giridhar Malavali
>> 
>>> Hi Ethan,
>>> 
>>> Our OEM partners reported to us that VPD access with latest distros were returning I/O error for them. They indicated this to be issue only with newer kernels.
>>> 
>>> One of the distro vendor pointed out patch posted by you to be reason for IO error trying to VPD. The patch looks like blocks access to VPD by blacklisting ISP.
>>> 
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d5370d1d85251e5893ab7c90a429464de2e140b;
>>> 
>>> I setup PCIe analyzer to reproduce this in our lab to root cause it and discovered that after reverting the patch.  I am able to get VPD data okay with upstream 5.1.0 and I used RHEL8.
>>> 
>>> I also used  "lspci" and "cat" to dump out VPD data and do not see any issue.
>>> 
>>> # lspci -vvv -s 03:00.0
>>> 03:00.0 Fibre Channel: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter (rev 01)
>>>                Subsystem: QLogic Corp. QLE2742 Dual Port 32Gb Fibre Channel to PCIe Adapter
>>>                Physical Slot: 15
>>>                Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
>>>                Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>                Latency: 0, Cache Line Size: 64 bytes
>>>                Interrupt: pin A routed to IRQ 67
>>>                NUMA node: 0
>>>                Region 0: Memory at fbe05000 (64-bit, prefetchable) [size=4K]
>>>                Region 2: Memory at fbe02000 (64-bit, prefetchable) [size=8K]
>>>                Region 4: Memory at fbd00000 (64-bit, prefetchable) [size=1M]
>>>                Expansion ROM at fb540000 [disabled] [size=256K]
>>>                Capabilities: [44] Power Management version 3
>>>                                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
>>>                                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>                Capabilities: [4c] Express (v2) Endpoint, MSI 00
>>>                                DevCap:                MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <4us, L1 <1us
>>>                                                ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
>>>                                DevCtl:  Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
>>>                                                RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
>>>                                                MaxPayload 256 bytes, MaxReadReq 4096 bytes
>>>                                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
>>>                                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <512ns, L1 <2us
>>>                                                ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
>>>                                LnkCtl:  ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>>>                                                ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>>                                LnkSta:  Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>                                DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR-, OBFF Not Supported
>>>                                                AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>                                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>>>                                                AtomicOpsCtl: ReqEn-
>>>                                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
>>>                                                Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>>>                                                Compliance De-emphasis: -6dB
>>>                                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
>>>                                                EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
>>>                Capabilities: [88] Vital Product Data
>>>                                Product Name: QLogic 32Gb 2-port FC to PCIe Gen3 x8 Adapter
>>>                                Read-only fields:
>>>                                                [PN] Part number: QLE2742
>>>                                                [SN] Serial number: RFD1706R22611
>>>                                                [EC] Engineering changes: BK3210408-05 04
>>>                                                [V9] Vendor specific: 010189
>>>                                                [RV] Reserved: checksum good, 0 byte(s) reserved
>>>                                End
>>>                Capabilities: [90] MSI-X: Enable+ Count=16 Masked-
>>>                                Vector table: BAR=2 offset=00000000
>>>                                PBA: BAR=2 offset=00001000
>>>                Capabilities: [9c] Vendor Specific Information: Len=0c <?>
>>>                Capabilities: [100 v1] Advanced Error Reporting
>>>                                UESta:   DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>                                UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>                                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>                                CESta:   RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>>>                                CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>>>                                AERCap:               First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
>>>                                                MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>                                HeaderLog: 00000000 00000000 00000000 00000000
>>>                Capabilities: [154 v1] Alternative Routing-ID Interpretation (ARI)
>>>                                ARICap: MFVC- ACS-, Next Function: 1
>>>                                ARICtl:   MFVC- ACS-, Function Group: 0
>>>                Capabilities: [1c0 v1] #19
>>>                Capabilities: [1f4 v1] Vendor Specific Information: ID=0001 Rev=1 Len=014 <?>
>>>                Kernel driver in use: qla2xxx
>>>                Kernel modules: qla2xxx
>>> 
>>> # cat /sys/bus/pci/devices/0000\:03\:00.0/vpd
>>> RFD1706R22611ECBK3210408-05 04V9010189RV�x
>>> 
>>> Can you share some more insight into where you encountered issue? I am in process of reverting this patch from upstream kernel but wanted to reach out and find out if you still have setup to provide more context.
>> 
>> 0d5370d1d852 ("PCI: Prevent VPD access for QLogic ISP2722") prevented
>> a panic while reading VPD, so we can't simply revert it.
>> 
>> Since you don't see a panic while reading VPD from that device, it's
>> possible that a QLogic firmware change fixed the VPD format so Linux
>> no longer reads the area that caused the problem.  Or possibly your
>> system doesn't handle the config read error the same way Ethan's HP
>> DL380 does.  Unfortunately we don't have an actual PCIe analyzer trace
>> from Ethan's system, so we don't know exactly what happened on PCIe.
>> 
>> I suggest that you capture the entire VPD area and hexdump it, e.g.,
>> with "xxd", and look at its structure.  pci_vpd_size() parses it and
>> computes the valid size based on a PCI_VPD_STIN_END tag, and
>> pci_vpd_read() should not read past that size.
>> 
>> And you *do* have an analyzer trace.  If new QLogic firmware fixed the
>> VPD format, the trace should show that Linux read only the valid part
>> of VPD, and there should be no errors in the trace.  Then it might
>> just be a question of tweaking the quirk so it allows VPD reads if the
>> firmware is new enough.
>> 
>> But if the trace does show config reads with errors, then it might be
>> that your system just tolerates the errors while the DL380 did not.
>> Then we'd have to figure out exactly what the error was and how to
>> deal with it so things work on both your system and Ethan's.
>> 
>> Bjorn


I have verified after reverting patch on HP DL380 Gen10 hardware by capturing PCIe trace

Here’s snippet from dmidecode output 

System Information:
Manufacturer: HPE
Product Name: ProLiant DL380 Gen10
Family: ProLiant
        
We are able to successfully read VPD config data using lspci and cat command

# lspci -d1077:
13:00.0 Fibre Channel: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter (rev 01)
13:00.1 Fibre Channel: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter (rev 01)
[root@R8 ~]# lspci -vvv -s 13:00.0
13:00.0 Fibre Channel: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter (rev 01)
	Subsystem: QLogic Corp. QLE2742 Dual Port 32Gb Fibre Channel to PCIe Adapter
	Physical Slot: 3
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 28
	NUMA node: 0
	Region 0: Memory at de605000 (64-bit, prefetchable) [size=4K]
	Region 2: Memory at de602000 (64-bit, prefetchable) [size=8K]
	Region 4: Memory at de500000 (64-bit, prefetchable) [size=1M]
	[virtual] Expansion ROM at de800000 [disabled] [size=256K]
	Capabilities: [44] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [4c] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <4us, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <512ns, L1 <2us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR-, OBFF Not Supported
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [88] Vital Product Data
		Product Name: QLogic 32Gb 2-port FC to PCIe Gen3 x8 Adapter
		Read-only fields:
			[PN] Part number: QLE2742
			[SN] Serial number: AFD1533Y02999
			[EC] Engineering changes: BK3210407-05 03
			[V9] Vendor specific: 010189
			[RV] Reserved: checksum good, 0 byte(s) reserved
		End
	Capabilities: [90] MSI-X: Enable+ Count=16 Masked-
		Vector table: BAR=2 offset=00000000
		PBA: BAR=2 offset=00001000
	Capabilities: [9c] Vendor Specific Information: Len=0c <?>
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP- SDES+ TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [154 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 1
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [1c0 v1] #19
	Capabilities: [1f4 v1] Vendor Specific Information: ID=0001 Rev=1 Len=014 <?>
	Kernel driver in use: qla2xxx
	Kernel modules: qla2xxx

We also verified this same configuration on a SuperMicro X10SRA-F server (which i had sent in earlier email)’
and were able to verify that the VPD read was good and there were no errors on PCIe trace. Given this information, Please consider reverting the patch until we further debug the issue and resolve as it is
affecting general availability of our adapter.

Thanks,
Himanshu




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux