On Fri, 18 Nov 2022 10:08:55 -0700 Dave Jiang <dave.jiang@xxxxxxxxx> wrote: > From: Dan Williams <dan.j.williams@xxxxxxxxx> > > Add nominal error handling that tears down CXL.mem in response to error > notifications that imply a device reset. Given some CXL.mem may be > operating as System RAM, there is a high likelihood that these error > events are fatal. However, if the system survives the notification the > expectation is that the driver behavior is equivalent to a hot-unplug > and re-plug of an endpoint. > > Note that this does not change the mask values from the default. That > awaits CXL _OSC support to determine whether platform firmware is in > control of the mask registers. > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > Signed-off-by: Dave Jiang <dave.jiang@xxxxxxxxx> Maybe something for the future, but if multiple errors are reported in the CXL RAS structures, we should be able to keep iterating to report them all + reset just the once. I think that relies on Multiple_Header_Recording_Capability though if we want useful data. Looks good to me though I have messaged one of our RAS experts to take a look as I only end up touching this aspect of PCI drivers once in a blue moon! Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>