Re: [PATCH v3 09/11] cxl/pci: Add (hopeful) error handling support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 18 Nov 2022 10:08:55 -0700
Dave Jiang <dave.jiang@xxxxxxxxx> wrote:

> From: Dan Williams <dan.j.williams@xxxxxxxxx>
> 
> Add nominal error handling that tears down CXL.mem in response to error
> notifications that imply a device reset. Given some CXL.mem may be
> operating as System RAM, there is a high likelihood that these error
> events are fatal. However, if the system survives the notification the
> expectation is that the driver behavior is equivalent to a hot-unplug
> and re-plug of an endpoint.
> 
> Note that this does not change the mask values from the default. That
> awaits CXL _OSC support to determine whether platform firmware is in
> control of the mask registers.
> 
> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> Signed-off-by: Dave Jiang <dave.jiang@xxxxxxxxx>

Maybe something for the future, but if multiple errors are reported
in the CXL RAS structures, we should be able to keep iterating to
report them all + reset just the once.
I think that relies on Multiple_Header_Recording_Capability though
if we want useful data.

Looks good to me though I have messaged one of our RAS experts
to take a look as I only end up touching this aspect of PCI drivers
once in a blue moon!

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>






[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux