Re: Red Hat (Fedora) bug report 1467674 concerning your kernel functional performance enhancements causing PCI Express crashes,

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/4/2017 1:59 PM, Wim ten Have wrote:
> On Tue, 4 Jul 2017 11:57:37 -0400
> Sinan Kaya <okaya@xxxxxxxxxxxxxx> wrote:
> 
>> Hi,
>>
>> On 7/4/2017 11:32 AM, Bjorn Helgaas wrote:
>>> [+cc linux-pci]
>>>
>>> Thanks very much for the detailed problem report, Wim!  I'm taking the
>>> liberty to forward to the linux-pci list in case others trip over the
>>> same thing.
>>>   
>>
>> So, the spec is lying :) and reality doesn't match theory.
>>
>> "Per the ECN mentioned below, all PCIe Receivers are expected to support
>>  Extended Tags"
>>
>>> The problem is not specific to this piece of h/w.  I did pin-point the
>>> issue to specific kernel code commit  
>> <snip>
>>
>>> 60db3a4d8cc9073cf56264785197ba75ee1caca4
>>>   * <wtenhave@hagen:55> git bisect good
>>>     60db3a4d8cc9073cf56264785197ba75ee1caca4 is the first bad commit
>>>     commit 60db3a4d8cc9073cf56264785197ba75ee1caca4
>>>     Author: Sinan Kaya <okaya@xxxxxxxxxxxxxx>
>>>     Date:   Fri Jan 20 09:16:51 2017 -0500
>>>
>>>       PCI: Enable PCIe Extended Tags if supported
>>>   
>> <snip>
>>
>>> 3. Boot and see it crash as soon it starts to operate on specific PCI
>>> Express Ethernet controller.
>>>   
>>
>> I guess we have an endpoint/system with errata that needs to be blacklisted.
>> Can you please try another endpoint with the same system?
>>
>> You have conflicting information above. I want to understand whether it
>> is the endpoint or the system that needs to be blacklisted.
> 
>   Specific PCI Express ethernet are embedded on the systems mainboard.
>   There's only one PCI Express that requires a riser card.  It is empty.
>
>> Please also provide sudo lspci -vvv output from the system with the patch.
>> Sinan
> 
>   Detail (lspci -vvv) is added to RedHat filed bugzilla entry; BugID 1467674
>   since the info is rather large.
> 
> 	https://bugzilla.redhat.com/show_bug.cgi?id=1467674

I think I understand the issue better now. The ECN seems to be introduced against
PCIE 2.0 spec. 

The PCI Express bridge you have is a Broadcom HT 2100 bridge which seems to support
PCI-Express V1.0 and 1.0a compliant only.

http://www.hard-net.de/info_wissen/chipsatz/broadcom/HT-2100.pdf

I can also see this in your lspci output. 

00:08.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 19
	NUMA node: 0
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 0000f000-00000fff [empty]
	Memory behind bridge: efe00000-efefffff [size=1M]
	Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [empty]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [a0] HyperTransport: MSI Mapping Enable+ Fixed-
		Mapping Address Base: 00000000fee00000
	Capabilities: [b0] Express (v1) Root Port (Slot-), MSI 00

I'll post a patch to apply extended tags to systems with PCI express v2 and higher
bridges only.


> 
> Enjoy,
> - Wim.
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux