On Thu, Jan 11, 2024 at 03:32:17PM +0800, Wang, Qingshun wrote: > If we are processing an Advisory Non-Fatal Error, first check the Device > Status. If any of Fatal/Non-Fatal Error Detected bits is set, leave it > to uncorrectable error handler to clear the UE status bit, which should > be executed right after the CE handler in this case. > > Otherwise, filter out uncorrectable errors that is not possible to > trigger an Advisory Non-Fatal Error, then clear all the rest status bits. > +static int anfe_get_related_err(struct aer_err_info *info) > +{ > + /* > + * Take the most conservative route here. If there are > + * Non-Fatal/Fatal errors detected, do not assume any > + * bit in uncor_status is set by ANFE. > + */ > + if (info->device_status & (PCI_EXP_DEVSTA_NFED | PCI_EXP_DEVSTA_FED)) > + return 0; > + /* > + * An UNCOR error may cause Advisory Non-Fatal error if: > + * a. The severity of the error is Non-Fatal. > + * b. The error is one of the following: > + * 1. Poisoned TLP > + * 2. Completion Timeout > + * 3. Completer Abort > + * 4. Unexpected Completion > + * 5. Unsupported Request This could benefit from a reference to the spec that outlines these conditions. Bjorn