Re: [PATCH 01/15] cxl/aer/pci: Add CXL PCIe port error handler callbacks in AER service driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dan,

I added a question below.

On 10/22/2024 6:43 PM, Dan Williams wrote:
> Terry Bowman wrote:
> [..]
>> I was referring to reusing separate instance of 'struct pci_error_handlers' for CXL
>> UCE-CE errors.
>>
>> One example where it can be reused in infrastructure is in err.c's
>> report_error_detected(). If both PCIe and CXL errors use 'struct pci_error_handlers'
>> then the updated report_error_detected() becomes a bit simpler with less helper
>> function logic.
> report_error_detected() is concerned with link and i/o state
> (pci_dev_is_disconnected() and pci_dev_set_io_state()). For device
> disconnects, CXL recovery potentially needs to span multiple devices.
> For i/o state, CXL.io could be fully operational while CXL.cache and
> CXL.mem are in fatal state.
>
> CXL considerations do not feel welcome in that function.
>
> Ideally a PCIe developer never needs to see or understand the CXL error
> model because it is off in its own path. In other words, if someone
> maintaining pcie_do_recovery=>report_error_detected() for the PCIe case
> needs to go find a CXL expert each time they want to touch that path,
> that feels like a regression in PCIe error handling maintainability.
>
>> But, it's not a reason by itself to choose to reuse 'struct
>> pci_error_handlers' for CXL errors.
>>
>> Looking closer at aer,c shows there is no advantage in this file for using 'struct
>> pci_error_handlers' for CXL errors.
>>
>> If I understand correctly you want a new type introduced, 'struct cxl_error_handlers'.
> Yes, mainly because the bus state and the result of the recovery tend to
> be a different operational model. If a CXL error fits the PCIe model
> then it can be sent via pcie_do_recovery(), but I expect that only
> applies to a handful of correctable errors like CRC_Threshold,
> Retry_Threshold, or Physical_Layer_Error. Almost everything else *seems*
> like it has a CXL specific response that would confuse
> pcie_do_recovery(). 
>
> So, in general new operational models == new data structures and types.

Would you like to continue to use the pci_error_handlers for the CXL PCIe 
endpoint device driver? Or do we change the CXL PCIe endpoint driver to 
use the cxl_error_handlers ?
Regards,
Terry





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux