Re: [PATCH v3 2/4] PCI: brcmstb: Add ACPI config space quirk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/22/21 10:17 AM, Pali Rohár wrote:
> On Friday 22 October 2021 10:04:36 Florian Fainelli wrote:
>> On 10/5/21 7:07 PM, Florian Fainelli wrote:
>>>
>>>
>>> On 10/5/2021 3:25 PM, Jeremy Linton wrote:
>>>> Hi,
>>>>
>>>> On 10/5/21 2:43 PM, Pali Rohár wrote:
>>>>> Hello!
>>>>>
>>>>> On Tuesday 05 October 2021 10:57:18 Jeremy Linton wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 10/5/21 10:32 AM, Bjorn Helgaas wrote:
>>>>>>> On Thu, Aug 26, 2021 at 02:15:55AM -0500, Jeremy Linton wrote:
>>>>>>>> Additionally, some basic bus/device filtering exist to avoid sending
>>>>>>>> config transactions to invalid devices on the RP's primary or
>>>>>>>> secondary bus. A basic link check is also made to assure that
>>>>>>>> something is operational on the secondary side before probing the
>>>>>>>> remainder of the config space. If either of these constraints are
>>>>>>>> violated and a config operation is lost in the ether because an EP
>>>>>>>> doesn't respond an unrecoverable SERROR is raised.
>>>>>>>
>>>>>>> It's not "lost"; I assume the root port raises an error because it
>>>>>>> can't send a transaction over a link that is down.
>>>>>>
>>>>>> The problem is AFAIK because the root port doesn't do that.
>>>>>
>>>>> Interesting! Does it mean that PCIe Root Complex / Host Bridge (which I
>>>>> guess contains also logic for Root Port) does not signal transaction
>>>>> failure for config requests? Or it is just your opinion? Because I'm
>>>>> dealing with similar issues and I'm trying to find a way how to detect
>>>>> if some PCIe IP signal transaction error via AXI SLVERR response OR it
>>>>> just does not send any response back. So if you know some way how to
>>>>> check which one it is, I would like to know it too.
>>>>
>>>> This is my _opinion_ based on what I've heard of some other IP
>>>> integration issues, and what i've seen poking at this one from the
>>>> perspective of a SW guy rather than a HW guy. So, basically worthless.
>>>> But, you should consider that most of these cores/interconnects aren't
>>>> aware of PCIe completion semantics so its the root ports
>>>> responsibility to say, gracefully translate a non-posted write that
>>>> doesn't have a completion for the interconnects its attached to,
>>>> rather than tripping something generic like a SLVERR.
>>>>
>>>> Anyway, for this I would poke around the pile of exception registers,
>>>> with your specific processors manual handy because a lot of them are
>>>> implementation defined.
>>>
>>> I should be able to get you an answer in the new few days whether
>>> configuration space requests also generate an error towards the ARM CPU,
>>> since memory space requests most definitively do.
>>
>> Did not get an answer from the design team, but going through our bug
>> tracker, there were evidences of configuration space accesses also
>> generating external aborts:
>>
>> [    8.988237] Unhandled fault: synchronous external abort (0x96000210) at 0xffffff8009539004
>> [    9.026698] PC is at pci_generic_config_read32+0x30/0xb0
> 
> So this is error caused by reading from config space.
> 
> Can you check if also writing to config space can trigger some crash? If
> yes, I would like to know if write would be also synchronous or rather
> asynchronous abort.

Yes it does and AFAICT it always shows up as a system error interrupt,
here is an example:

# setpci -d *:* latency_timer=40
[   25.909644] SError Interrupt on CPU2, code 0xbf000002 -- SError
[   25.909647] CPU: 2 PID: 1676 Comm: setpci Not tainted
5.10.70-0.2pre-ge3872e15011b #2
[   25.909649] Hardware name: BCM972165SV_V10 (DT)
[   25.909651] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[   25.909652] pc : pci_user_write_config_byte+0x6c/0x78
[   25.909654] lr : pci_user_write_config_byte+0x68/0x78
[   25.909655] sp : ffffffc015853c20
[   25.909656] x29: ffffffc015853c20 x28: ffffff8003053000
[   25.909661] x27: 0000000000000000 x26: 0000000000000000
[   25.909664] x25: 0000000000000001 x24: ffffff8004a23780
[   25.909668] x23: ffffff80049aa000 x22: ffffffc015853d68
[   25.909671] x21: 0000000000000040 x20: 000000000000000d
[   25.909674] x19: 000000000000000e x18: 0000000000000000
[   25.909677] x17: 0000000000000000 x16: 0000000000000000
[   25.909680] x15: 0000000000000000 x14: 0000000000000000
[   25.909684] x13: 0000000000000000 x12: 0000000000000000
[   25.909687] x11: 0000000000000000 x10: 0000000000000000
[   25.909690] x9 : ffffffc010483214 x8 : 0000000000000000
[   25.909693] x7 : ffffff800498df00 x6 : ffffff80049a8380
[   25.909696] x5 : ffffffc015510000 x4 : ffffff80049a9800
[   25.909699] x3 : 0000000000000000 x2 : 000000000000000d
[   25.909702] x1 : 0000000000000000 x0 : 0000000000000000
[   25.909706] Kernel panic - not syncing: Asynchronous SError Interrupt
[   25.909708] CPU: 2 PID: 1676 Comm: setpci Not tainted
5.10.70-0.2pre-ge3872e15011b #2
[   25.909710] Hardware name: BCM972165SV_V10 (DT)
[   25.909711] Call trace:
[   25.909712]  dump_backtrace+0x0/0x1d0
[   25.909713]  show_stack+0x1c/0x24
[   25.909714]  dump_stack+0xd0/0x12c
[   25.909716]  panic+0x128/0x308
[   25.909717]  nmi_panic+0x50/0x70
[   25.909718]  arm64_serror_panic+0x74/0x80
[   25.909720]  do_serror+0x28/0x60
[   25.909721]  el1_error+0x8c/0x10c
[   25.909722]  pci_user_write_config_byte+0x6c/0x78
[   25.909724]  pci_write_config+0x7c/0x1a0
[   25.909725]  sysfs_kf_bin_write+0x64/0x84
[   25.909727]  kernfs_fop_write_iter+0xbc/0x170
[   25.909728]  new_sync_write+0x80/0xcc
[   25.909729]  vfs_write+0xec/0x110
[   25.909730]  ksys_pwrite64+0x50/0x8c
[   25.909732]  __arm64_sys_pwrite64+0x20/0x28
[   25.909733]  el0_svc_common.constprop.4+0x100/0x184
[   25.909735]  do_el0_svc+0x38/0x78
[   25.909736]  el0_svc+0x1c/0x28
[   25.909737]  el0_sync_handler+0x64/0x12c
[   25.909738]  el0_sync+0x148/0x180
[   25.909775] brcm-pcie 8b20000.pcie: Error: CFG Acc, 32bit, Write,
Bus=1, Dev=0, Fun=0, Reg=0xc, lanes=01000000
[   26.136082] brcm-pcie 8b20000.pcie:  Type: TO=0 Abt=0 UnsupReq=0
AccTO=0 AccDsbld=1 Acc64bit=0
[   26.144709] SMP: stopping secondary CPUs
[   26.144711] Kernel Offset: disabled
[   26.144712] CPU features: 0x0040002,24002004
[   26.144713] Memory Limit: none

-- 
Florian



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux