On Fri, Jan 12, 2024 at 10:35:26AM -0600, Bjorn Helgaas wrote: > On Thu, Jan 11, 2024 at 03:32:17PM +0800, Wang, Qingshun wrote: > > If we are processing an Advisory Non-Fatal Error, first check the Device > > Status. If any of Fatal/Non-Fatal Error Detected bits is set, leave it > > to uncorrectable error handler to clear the UE status bit, which should > > be executed right after the CE handler in this case. > > > > Otherwise, filter out uncorrectable errors that is not possible to > > trigger an Advisory Non-Fatal Error, then clear all the rest status bits. > > > +static int anfe_get_related_err(struct aer_err_info *info) > > +{ > > + /* > > + * Take the most conservative route here. If there are > > + * Non-Fatal/Fatal errors detected, do not assume any > > + * bit in uncor_status is set by ANFE. > > + */ > > + if (info->device_status & (PCI_EXP_DEVSTA_NFED | PCI_EXP_DEVSTA_FED)) > > + return 0; > > + /* > > + * An UNCOR error may cause Advisory Non-Fatal error if: > > + * a. The severity of the error is Non-Fatal. > > + * b. The error is one of the following: > > + * 1. Poisoned TLP > > + * 2. Completion Timeout > > + * 3. Completer Abort > > + * 4. Unexpected Completion > > + * 5. Unsupported Request > > This could benefit from a reference to the spec that outlines these > conditions. Thanks for suggestion. Will add a reference to latest spec. > > Bjorn Best regards Wang, Qingshun