Re: [PATCH v5 05/16] PCI/AER: Add CXL PCIe Port correctable error support in AER service driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 6 Feb 2025 13:33:55 -0500
Gregory Price <gourry@xxxxxxxxxx> wrote:

> On Tue, Jan 07, 2025 at 08:38:41AM -0600, Terry Bowman wrote:
> > The AER service driver supports handling Downstream Port Protocol Errors in
> > Restricted CXL host (RCH) mode also known as CXL1.1. It needs the same
> > functionality for CXL PCIe Ports operating in Virtual Hierarchy (VH)
> > mode.[1]
> > 
> > CXL and PCIe Protocol Error handling have different requirements that
> > necessitate a separate handling path. The AER service driver may try to
> > recover PCIe uncorrectable non-fatal errors (UCE). The same recovery is not
> > suitable for CXL PCIe Port devices because of potential for system memory
> > corruption. Instead, CXL Protocol Error handling must use a kernel panic
> > in the case of a fatal or non-fatal UCE. The AER driver's PCIe Protocol
> > Error handling does not panic the kernel in response to a UCE.
> >  
> 
> Naive question: is a panic actually required if the memory is a userland
> resource?

It's a protocol error, not a contained memory issue.
You'd need to find everything using that memory and kill it.

Maybe longer term if it's DAX and we know whole device is allocated
to only a few apps can resolve more smoothly.


> 
> The code in arch/x86/kernel/cpu/mce/core.c suggests we may not panic
> if an uncorrectable error occurs in this fashion, but simply a SIGBUS.
> 
> Unless this is down the wrong pipe - in which case disregard.
> 
> I'm still digging through background on this patch set so I may be
> barking up the wrong tree.
> 
> ~Gregory





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux