Hi Dan, I added a question below. On 10/22/2024 6:43 PM, Dan Williams wrote: > Terry Bowman wrote: > [..] >> I was referring to reusing separate instance of 'struct pci_error_handlers' for CXL >> UCE-CE errors. >> >> One example where it can be reused in infrastructure is in err.c's >> report_error_detected(). If both PCIe and CXL errors use 'struct pci_error_handlers' >> then the updated report_error_detected() becomes a bit simpler with less helper >> function logic. > report_error_detected() is concerned with link and i/o state > (pci_dev_is_disconnected() and pci_dev_set_io_state()). For device > disconnects, CXL recovery potentially needs to span multiple devices. > For i/o state, CXL.io could be fully operational while CXL.cache and > CXL.mem are in fatal state. > > CXL considerations do not feel welcome in that function. > > Ideally a PCIe developer never needs to see or understand the CXL error > model because it is off in its own path. In other words, if someone > maintaining pcie_do_recovery=>report_error_detected() for the PCIe case > needs to go find a CXL expert each time they want to touch that path, > that feels like a regression in PCIe error handling maintainability. > >> But, it's not a reason by itself to choose to reuse 'struct >> pci_error_handlers' for CXL errors. >> >> Looking closer at aer,c shows there is no advantage in this file for using 'struct >> pci_error_handlers' for CXL errors. >> >> If I understand correctly you want a new type introduced, 'struct cxl_error_handlers'. > Yes, mainly because the bus state and the result of the recovery tend to > be a different operational model. If a CXL error fits the PCIe model > then it can be sent via pcie_do_recovery(), but I expect that only > applies to a handful of correctable errors like CRC_Threshold, > Retry_Threshold, or Physical_Layer_Error. Almost everything else *seems* > like it has a CXL specific response that would confuse > pcie_do_recovery(). > > So, in general new operational models == new data structures and types. Would you like to continue to use the pci_error_handlers for the CXL PCIe endpoint device driver? Or do we change the CXL PCIe endpoint driver to use the cxl_error_handlers ? Regards, Terry