On 10/22/21 10:17 AM, Pali Rohár wrote: > On Friday 22 October 2021 10:04:36 Florian Fainelli wrote: >> On 10/5/21 7:07 PM, Florian Fainelli wrote: >>> >>> >>> On 10/5/2021 3:25 PM, Jeremy Linton wrote: >>>> Hi, >>>> >>>> On 10/5/21 2:43 PM, Pali Rohár wrote: >>>>> Hello! >>>>> >>>>> On Tuesday 05 October 2021 10:57:18 Jeremy Linton wrote: >>>>>> Hi, >>>>>> >>>>>> On 10/5/21 10:32 AM, Bjorn Helgaas wrote: >>>>>>> On Thu, Aug 26, 2021 at 02:15:55AM -0500, Jeremy Linton wrote: >>>>>>>> Additionally, some basic bus/device filtering exist to avoid sending >>>>>>>> config transactions to invalid devices on the RP's primary or >>>>>>>> secondary bus. A basic link check is also made to assure that >>>>>>>> something is operational on the secondary side before probing the >>>>>>>> remainder of the config space. If either of these constraints are >>>>>>>> violated and a config operation is lost in the ether because an EP >>>>>>>> doesn't respond an unrecoverable SERROR is raised. >>>>>>> >>>>>>> It's not "lost"; I assume the root port raises an error because it >>>>>>> can't send a transaction over a link that is down. >>>>>> >>>>>> The problem is AFAIK because the root port doesn't do that. >>>>> >>>>> Interesting! Does it mean that PCIe Root Complex / Host Bridge (which I >>>>> guess contains also logic for Root Port) does not signal transaction >>>>> failure for config requests? Or it is just your opinion? Because I'm >>>>> dealing with similar issues and I'm trying to find a way how to detect >>>>> if some PCIe IP signal transaction error via AXI SLVERR response OR it >>>>> just does not send any response back. So if you know some way how to >>>>> check which one it is, I would like to know it too. >>>> >>>> This is my _opinion_ based on what I've heard of some other IP >>>> integration issues, and what i've seen poking at this one from the >>>> perspective of a SW guy rather than a HW guy. So, basically worthless. >>>> But, you should consider that most of these cores/interconnects aren't >>>> aware of PCIe completion semantics so its the root ports >>>> responsibility to say, gracefully translate a non-posted write that >>>> doesn't have a completion for the interconnects its attached to, >>>> rather than tripping something generic like a SLVERR. >>>> >>>> Anyway, for this I would poke around the pile of exception registers, >>>> with your specific processors manual handy because a lot of them are >>>> implementation defined. >>> >>> I should be able to get you an answer in the new few days whether >>> configuration space requests also generate an error towards the ARM CPU, >>> since memory space requests most definitively do. >> >> Did not get an answer from the design team, but going through our bug >> tracker, there were evidences of configuration space accesses also >> generating external aborts: >> >> [ 8.988237] Unhandled fault: synchronous external abort (0x96000210) at 0xffffff8009539004 >> [ 9.026698] PC is at pci_generic_config_read32+0x30/0xb0 > > So this is error caused by reading from config space. > > Can you check if also writing to config space can trigger some crash? If > yes, I would like to know if write would be also synchronous or rather > asynchronous abort. Yes it does and AFAICT it always shows up as a system error interrupt, here is an example: # setpci -d *:* latency_timer=40 [ 25.909644] SError Interrupt on CPU2, code 0xbf000002 -- SError [ 25.909647] CPU: 2 PID: 1676 Comm: setpci Not tainted 5.10.70-0.2pre-ge3872e15011b #2 [ 25.909649] Hardware name: BCM972165SV_V10 (DT) [ 25.909651] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) [ 25.909652] pc : pci_user_write_config_byte+0x6c/0x78 [ 25.909654] lr : pci_user_write_config_byte+0x68/0x78 [ 25.909655] sp : ffffffc015853c20 [ 25.909656] x29: ffffffc015853c20 x28: ffffff8003053000 [ 25.909661] x27: 0000000000000000 x26: 0000000000000000 [ 25.909664] x25: 0000000000000001 x24: ffffff8004a23780 [ 25.909668] x23: ffffff80049aa000 x22: ffffffc015853d68 [ 25.909671] x21: 0000000000000040 x20: 000000000000000d [ 25.909674] x19: 000000000000000e x18: 0000000000000000 [ 25.909677] x17: 0000000000000000 x16: 0000000000000000 [ 25.909680] x15: 0000000000000000 x14: 0000000000000000 [ 25.909684] x13: 0000000000000000 x12: 0000000000000000 [ 25.909687] x11: 0000000000000000 x10: 0000000000000000 [ 25.909690] x9 : ffffffc010483214 x8 : 0000000000000000 [ 25.909693] x7 : ffffff800498df00 x6 : ffffff80049a8380 [ 25.909696] x5 : ffffffc015510000 x4 : ffffff80049a9800 [ 25.909699] x3 : 0000000000000000 x2 : 000000000000000d [ 25.909702] x1 : 0000000000000000 x0 : 0000000000000000 [ 25.909706] Kernel panic - not syncing: Asynchronous SError Interrupt [ 25.909708] CPU: 2 PID: 1676 Comm: setpci Not tainted 5.10.70-0.2pre-ge3872e15011b #2 [ 25.909710] Hardware name: BCM972165SV_V10 (DT) [ 25.909711] Call trace: [ 25.909712] dump_backtrace+0x0/0x1d0 [ 25.909713] show_stack+0x1c/0x24 [ 25.909714] dump_stack+0xd0/0x12c [ 25.909716] panic+0x128/0x308 [ 25.909717] nmi_panic+0x50/0x70 [ 25.909718] arm64_serror_panic+0x74/0x80 [ 25.909720] do_serror+0x28/0x60 [ 25.909721] el1_error+0x8c/0x10c [ 25.909722] pci_user_write_config_byte+0x6c/0x78 [ 25.909724] pci_write_config+0x7c/0x1a0 [ 25.909725] sysfs_kf_bin_write+0x64/0x84 [ 25.909727] kernfs_fop_write_iter+0xbc/0x170 [ 25.909728] new_sync_write+0x80/0xcc [ 25.909729] vfs_write+0xec/0x110 [ 25.909730] ksys_pwrite64+0x50/0x8c [ 25.909732] __arm64_sys_pwrite64+0x20/0x28 [ 25.909733] el0_svc_common.constprop.4+0x100/0x184 [ 25.909735] do_el0_svc+0x38/0x78 [ 25.909736] el0_svc+0x1c/0x28 [ 25.909737] el0_sync_handler+0x64/0x12c [ 25.909738] el0_sync+0x148/0x180 [ 25.909775] brcm-pcie 8b20000.pcie: Error: CFG Acc, 32bit, Write, Bus=1, Dev=0, Fun=0, Reg=0xc, lanes=01000000 [ 26.136082] brcm-pcie 8b20000.pcie: Type: TO=0 Abt=0 UnsupReq=0 AccTO=0 AccDsbld=1 Acc64bit=0 [ 26.144709] SMP: stopping secondary CPUs [ 26.144711] Kernel Offset: disabled [ 26.144712] CPU features: 0x0040002,24002004 [ 26.144713] Memory Limit: none -- Florian