Re: [PATCH v6 5/6] acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Smita Koralahalli wrote:
> When PCIe AER is in FW-First, OS should process CXL Protocol errors from
> CPER records. Introduce support for handling and logging CXL Protocol
> errors.
> 
> The defined trace events cxl_aer_uncorrectable_error and
> cxl_aer_correctable_error trace native CXL AER endpoint errors. Reuse them
> to trace FW-First Protocol errors.
> 
> Since the CXL code is required to be called from process context and
> GHES is in interrupt context, use workqueues for processing.
> 
> Similar to CXL CPER event handling, use kfifo to handle errors as it
> simplifies queue processing by providing lock free fifo operations.
> 
> Add the ability for the CXL sub-system to register a workqueue to
> process CXL CPER protocol errors.
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx>
> Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> Reviewed-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> ---

This patch confuses me. The plumbing to route CXL component error
records back to the cxl_pci driver was motivated by the driver having a
significant amount of context about component state and code to handle
OS first reporting of component errors from the device mailbox.

Protocol errors are different. They implicate various ports where the
cxl_pci driver may not have any additional information to add.

I feel like this patch makes more sense after CXL protocol errors become
a first class citizen in the core, and that generic CXL protocol error
tracing lives in the core, not a cxl_pci driver callback.

So, similar to how aer_recover_queue() traces all PCIe protocol errors
and optionally lets endpoint drivers recover the link via
pcie_do_recovery(), a cxl_recover_queue() is needed. That would be the
place to land general CXL protocol error prints and optionally call back
into drivers to add more device specific color if necessary.

I am ok with the CXL core centralizing all protocol error processing
like the built-in PCI core, but the generic CXL memory expander driver,
cxl_pci, is the wrong place to handle system wide protocol errors across
all device types.

I expect this is new infrastructure that we will get from Terry's
patches, but please do challenge me if you think I am missing something.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux