Re: [PATCH v7 12/17] cxl/pci: Add error handler for CXL PCIe Port RAS errors

Dan Williams <dan.j.williams@xxxxxxxxx> · Fri, 14 Feb 2025 16:20:14 -0800

Bowman, Terry wrote:
[..]
> > Ok, so here is where the trouble I was alluding to earlier begins. At
> > this point we leave this scope which means @port will have its reference
> > dropped and may be freed by the time the caller tries to use it.
> >
> > Additionally, @ras_base is only valid while @port->dev.driver is set. In
> > this set, cxl_do_recovery() is only holding the device lock of @pdev
> > which means nothing synchronizes against @ras_base pointing to garbage
> > because a cxl_port was unbound / unplugged / disabled while error
> > recovery was running.
> >
> > Both of those problems go away if upon entry to ->error_detected() it
> > can already be assumed that the context holds both a 'struct cxl_port'
> > object reference, and the device_lock() for that object.
> 
> I think the question is will there be much gained by taking the lock
> earlier? The difference between the current implementation and the
> proposed would be when the reference (or lock) is taken: cxl_report_error()
> or cxl_port_error_detected()/cxl_port_cor_error_detected(). It's only a
> few function calls difference but the biggest difference is in the CXL
> topology search without reference or lock protection (you point at here).

My point is that this series is holding the *wrong* device_lock():

I.e.:

> +static int cxl_report_error_detected(struct pci_dev *dev, void *data)
> +{
> +       const struct cxl_error_handlers *cxl_err_handler;
> +       pci_ers_result_t vote, *result = data;
> +       struct pci_driver *pdrv;
> +
> +       device_lock(&dev->dev);

This lock against the PCI device does nothing to protect against unbind events for the cxl_port
object...

> +       pdrv = dev->driver; 
> +       if (!pdrv || !pdrv->cxl_err_handler ||
> +           !pdrv->cxl_err_handler->error_detected) 
> +               goto out;
> +
> +       cxl_err_handler = pdrv->cxl_err_handler;
> +       vote = cxl_err_handler->error_detected(dev);

...subsequently any usage of @ras_base in this ->error_detected() is
racy.

> +       *result = merge_result(*result, vote); 
> +out:
> +       device_unlock(&dev->dev); 

[..]
> Which directory do you see the CXL error handling and support landing
> in: pci/pcie/ or cxl/core/ or elsewhere ?

In cxl/core/, that's the only place that understands CXL port topology
and the lifetime rules for dport RAS register mappings.

> Should we consider submitting this patchset first and then adding the CXL
> kfifo changes you mention? It sounds like we need this for FW-first and
> could be reused to solve the OS-first issue (time without a lock).

The problem is that the PCI core is always built-in and the CXL core is
modular. Without a kfifo() and a registration scheme the CXL core could
not remain modular.

> Or, if you like I can start to add the CXL kfifo changes now.

I feel like there's enough examples of kfifo in error handling to make
this not too burdensome, but let me know if you disagree. Otherwise,
would need to spend the time to figure out how to keep the test
environment functioning (cxl-test depends on modular core builds).