Re: [PATCH v2 0/6] cxl: Initialization and shutdown fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 22.10.24 18:43:15, Dan Williams wrote:
> Changes since v1 [1]:
> - Fix some misspellings missed by checkpatch in changelogs (Jonathan)
> - Add comments explaining the order of objects in drivers/cxl/Makefile
>   (Jonathan)
> - Rename attach_device => cxl_rescan_attach (Jonathan)
> - Fixup Zijun's email (Zijun)
> 
> [1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@xxxxxxxxxxxxxxxxxxxxxxxxx
> 
> ---
> 
> Original cover:
> 
> Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
> delayed arrival of the CXL "root" infrastructure [1] prompted questions
> of how the existing mechanism for retrying cxl_mem_probe() could be
> failing.

I found a similar issue with the region creation. 

A region is created with the first endpoint found and immediately
added as device which triggers cxl_region_probe(). Now, in
interleaving setups the region state comes into commit state only
after the last endpoint was probed. So the probe must be repeated
until all endpoints were enumerated. I ended up with this change:

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index a07b62254596..c78704e435e5 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev)
 	}
 
 	if (p->state < CXL_CONFIG_COMMIT) {
-		dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
-		rc = -ENXIO;
+		rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
+				"region config state: %d\n", p->state);
 		goto out;
 	}
 
-- 
2.39.5

I don't see an init order issue here as the mem module is always up
before the regions are probed.

-Robert

> 
> The critical missing piece in the debug was that Gregory's setup had
> almost all CXL modules built-in to the kernel.
> 
> On the way to that discovery several other bugs and init-order corner
> cases were discovered.
> 
> The main fix is to make sure the drivers/cxl/Makefile object order
> supports root CXL ports being fully initialized upon cxl_acpi_probe()
> exit. The modular case has some similar potential holes that are fixed
> with MODULE_SOFTDEP() and other fix ups. Finally, an attempt to update
> cxl_test to reproduce the original report resulted in the discovery of a
> separate long standing use after free bug in cxl_region_detach().
> 
> [2]: http://lore.kernel.org/20241004212504.1246-1-gourry@xxxxxxxxxx
> 
> ---
> 
> Dan Williams (6):
>       cxl/port: Fix CXL port initialization order when the subsystem is built-in
>       cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
>       cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
>       cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
>       cxl/port: Prevent out-of-order decoder allocation
>       cxl/test: Improve init-order fidelity relative to real-world systems
> 
> 
>  drivers/base/core.c          |   35 +++++++
>  drivers/cxl/Kconfig          |    1 
>  drivers/cxl/Makefile         |   20 +++-
>  drivers/cxl/acpi.c           |    7 +
>  drivers/cxl/core/hdm.c       |   50 +++++++++--
>  drivers/cxl/core/port.c      |   13 ++-
>  drivers/cxl/core/region.c    |   91 ++++++++++---------
>  drivers/cxl/cxl.h            |    3 -
>  include/linux/device.h       |    3 +
>  tools/testing/cxl/test/cxl.c |  200 +++++++++++++++++++++++-------------------
>  tools/testing/cxl/test/mem.c |    1 
>  11 files changed, 269 insertions(+), 155 deletions(-)
> 
> base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux