On 22.10.24 18:43:15, Dan Williams wrote: > Changes since v1 [1]: > - Fix some misspellings missed by checkpatch in changelogs (Jonathan) > - Add comments explaining the order of objects in drivers/cxl/Makefile > (Jonathan) > - Rename attach_device => cxl_rescan_attach (Jonathan) > - Fixup Zijun's email (Zijun) > > [1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@xxxxxxxxxxxxxxxxxxxxxxxxx > > --- > > Original cover: > > Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to > delayed arrival of the CXL "root" infrastructure [1] prompted questions > of how the existing mechanism for retrying cxl_mem_probe() could be > failing. I found a similar issue with the region creation. A region is created with the first endpoint found and immediately added as device which triggers cxl_region_probe(). Now, in interleaving setups the region state comes into commit state only after the last endpoint was probed. So the probe must be repeated until all endpoints were enumerated. I ended up with this change: diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index a07b62254596..c78704e435e5 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev) } if (p->state < CXL_CONFIG_COMMIT) { - dev_dbg(&cxlr->dev, "config state: %d\n", p->state); - rc = -ENXIO; + rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER, + "region config state: %d\n", p->state); goto out; } -- 2.39.5 I don't see an init order issue here as the mem module is always up before the regions are probed. -Robert > > The critical missing piece in the debug was that Gregory's setup had > almost all CXL modules built-in to the kernel. > > On the way to that discovery several other bugs and init-order corner > cases were discovered. > > The main fix is to make sure the drivers/cxl/Makefile object order > supports root CXL ports being fully initialized upon cxl_acpi_probe() > exit. The modular case has some similar potential holes that are fixed > with MODULE_SOFTDEP() and other fix ups. Finally, an attempt to update > cxl_test to reproduce the original report resulted in the discovery of a > separate long standing use after free bug in cxl_region_detach(). > > [2]: http://lore.kernel.org/20241004212504.1246-1-gourry@xxxxxxxxxx > > --- > > Dan Williams (6): > cxl/port: Fix CXL port initialization order when the subsystem is built-in > cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices() > cxl/acpi: Ensure ports ready at cxl_acpi_probe() return > cxl/port: Fix use-after-free, permit out-of-order decoder shutdown > cxl/port: Prevent out-of-order decoder allocation > cxl/test: Improve init-order fidelity relative to real-world systems > > > drivers/base/core.c | 35 +++++++ > drivers/cxl/Kconfig | 1 > drivers/cxl/Makefile | 20 +++- > drivers/cxl/acpi.c | 7 + > drivers/cxl/core/hdm.c | 50 +++++++++-- > drivers/cxl/core/port.c | 13 ++- > drivers/cxl/core/region.c | 91 ++++++++++--------- > drivers/cxl/cxl.h | 3 - > include/linux/device.h | 3 + > tools/testing/cxl/test/cxl.c | 200 +++++++++++++++++++++++------------------- > tools/testing/cxl/test/mem.c | 1 > 11 files changed, 269 insertions(+), 155 deletions(-) > > base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b