On 2/12/2025 1:57 PM, Dan Williams wrote: > Bowman, Terry wrote: > [..] >>> Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> >> Ok. I can add is_cxl to 'struct aer_err_info'. Shall I set it by reading the >> alternate protocol link state? > I am thinking no because dev->is_cxl at least indicates that a CXL link > was up at some point, and racing CXL link down is not something the > error core can reasonably mitigate. > > In the end I think that it should be something like: > > info->is_cxl = dev->is_cxl && is_internal_error() > > ...on the expectation that a CXL device is unlikely to multiplex > internal errors across CXL protocol error events and device-specific > internal events. Even if a device *did* multiplex those I think it is > reasonable for the kernel to treat a device-specific UCE the same as a > CXL protocol UCE and panic the system. Ok. I found in using is_internal_error() (v5) a USP with fatal UCE will not have AER status populated in aer_info structure, only the severity field is populated (see aer_get_device_error_info()). The aer_info is not populated because concern reading the USP's AER (config space) when the upstream link state is invalid. Calling is_internal_error() in this case will return false because the uncorrectable internal error (UIE) bit is 0 and proceed to treat as a PCIe error. How do you want to proceed to handle the UCE protocol error in this case? Terry