Re: [PATCH v7 12/17] cxl/pci: Add error handler for CXL PCIe Port RAS errors

"Bowman, Terry" <terry.bowman@xxxxxxx> · Tue, 18 Feb 2025 09:33:40 -0600

On 2/14/2025 6:20 PM, Dan Williams wrote:
> Bowman, Terry wrote:
> [..]
>>> Ok, so here is where the trouble I was alluding to earlier begins. At
>>> this point we leave this scope which means @port will have its reference
>>> dropped and may be freed by the time the caller tries to use it.
>>>
>>> Additionally, @ras_base is only valid while @port->dev.driver is set. In
>>> this set, cxl_do_recovery() is only holding the device lock of @pdev
>>> which means nothing synchronizes against @ras_base pointing to garbage
>>> because a cxl_port was unbound / unplugged / disabled while error
>>> recovery was running.
>>>
>>> Both of those problems go away if upon entry to ->error_detected() it
>>> can already be assumed that the context holds both a 'struct cxl_port'
>>> object reference, and the device_lock() for that object.
>> I think the question is will there be much gained by taking the lock
>> earlier? The difference between the current implementation and the
>> proposed would be when the reference (or lock) is taken: cxl_report_error()
>> or cxl_port_error_detected()/cxl_port_cor_error_detected(). It's only a
>> few function calls difference but the biggest difference is in the CXL
>> topology search without reference or lock protection (you point at here).
> My point is that this series is holding the *wrong* device_lock():
>
> I.e.:
>
>> +static int cxl_report_error_detected(struct pci_dev *dev, void *data)
>> +{
>> +       const struct cxl_error_handlers *cxl_err_handler;
>> +       pci_ers_result_t vote, *result = data;
>> +       struct pci_driver *pdrv;
>> +
>> +       device_lock(&dev->dev);
> This lock against the PCI device does nothing to protect against unbind events for the cxl_port
> object...

I'll update to use the CXL device instead of the PCI device.

>> +       pdrv = dev->driver; 
>> +       if (!pdrv || !pdrv->cxl_err_handler ||
>> +           !pdrv->cxl_err_handler->error_detected) 
>> +               goto out;
>> +
>> +       cxl_err_handler = pdrv->cxl_err_handler;
>> +       vote = cxl_err_handler->error_detected(dev);
> ...subsequently any usage of @ras_base in this ->error_detected() is
> racy.
>
>> +       *result = merge_result(*result, vote); 
>> +out:
>> +       device_unlock(&dev->dev); 
> [..]
>> Which directory do you see the CXL error handling and support landing
>> in: pci/pcie/ or cxl/core/ or elsewhere ?
> In cxl/core/, that's the only place that understands CXL port topology
> and the lifetime rules for dport RAS register mappings.
>
>> Should we consider submitting this patchset first and then adding the CXL
>> kfifo changes you mention? It sounds like we need this for FW-first and
>> could be reused to solve the OS-first issue (time without a lock).
> The problem is that the PCI core is always built-in and the CXL core is
> modular. Without a kfifo() and a registration scheme the CXL core could
> not remain modular.
>
>> Or, if you like I can start to add the CXL kfifo changes now.
> I feel like there's enough examples of kfifo in error handling to make
> this not too burdensome, but let me know if you disagree. Otherwise,
> would need to spend the time to figure out how to keep the test
> environment functioning (cxl-test depends on modular core builds).

Thanks for the feedback. Yes, there are several examples and Smita is using for
FW-first as well. Correct me if I'm wrong but the goal in this case is
for the FW-first and OS-first to use the same kfifo.

Terry