> > > > There *is* a way for a PCIe device to say "I need more time". It does > > this by responding to that Vendor ID config read with Request Retry > > Status (RRS, aka CRS in older specs), which means "I'm not ready yet, > > but I will be ready in the future." Adding a delay would definitely > > make a difference here, so my guess is this is what's happening. > > > > Most root complexes return ~0 data to the CPU when a config read > > terminates with UR or RRS. It sounds like rockchip does this for UR > > but possibly not for RRS. > > > > There is a "RRS Software Visibility" feature, which is supposed to > > turn the RRS into a special value (Vendor ID == 0x0001), but per [1], > > rockchip doesn't support it (lspci calls it "CRSVisible"). > > > > But the CPU load instruction corresponding to the config read has to > > complete by reading *something* or else be aborted. It sounds like > > it's aborted in this case. I don't know the arm64 details, but if we > > could catch that abort and determine that it was an RRS and not a UR, > > maybe we could fabricate the magic RRS 0x0001 value. > > > > imx6q_pcie_abort_handler() does something like that, although I think > > it's for arm32, not arm64. But obviously we already catch the abort > > enough to dump the register state and panic, so maybe there's a way to > > extend that? > > Perhaps a hook mechanism that allows drivers to register with the > serror handler and offer to handle specific errors before the generic > code causes the system panic? This sounds to me a good general solution that also help to handle future HW like this one. So this is a Concept Ack for me. Cheers! Vincent.