On Wed, Nov 13, 2024 at 03:54:21PM -0600, Terry Bowman wrote: > Non-fatal CXL UCE errors will be treated as fatal. Hm, I wonder why? > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1048,7 +1048,10 @@ static void cxl_handle_error(struct pci_dev *dev, struct aer_err_info *info) > pdrv->cxl_err_handler->cor_error_detected(dev); > > pcie_clear_device_status(dev); > - } > + } else if (info->severity == AER_NONFATAL) > + cxl_do_recovery(dev); > + else if (info->severity == AER_FATAL) > + cxl_do_recovery(dev); > } Nit: Maybe use curly braces and collapse both if-block into one. > + cxl_walk_bridge(bridge, cxl_report_error_detected, &status); > + if (status) > + panic("CXL cachemem error. Invoking panic"); Nit: This will be prefixed by "Kernel panic - not syncing: ", so another "Invoking panic" message seems somewhat redundant. Thanks, Lukas