Robert Richter wrote: > On 22.10.24 18:43:15, Dan Williams wrote: > > Changes since v1 [1]: > > - Fix some misspellings missed by checkpatch in changelogs (Jonathan) > > - Add comments explaining the order of objects in drivers/cxl/Makefile > > (Jonathan) > > - Rename attach_device => cxl_rescan_attach (Jonathan) > > - Fixup Zijun's email (Zijun) > > > > [1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > --- > > > > Original cover: > > > > Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to > > delayed arrival of the CXL "root" infrastructure [1] prompted questions > > of how the existing mechanism for retrying cxl_mem_probe() could be > > failing. > > I found a similar issue with the region creation. > > A region is created with the first endpoint found and immediately > added as device which triggers cxl_region_probe(). Now, in > interleaving setups the region state comes into commit state only > after the last endpoint was probed. So the probe must be repeated > until all endpoints were enumerated. I ended up with this change: > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index a07b62254596..c78704e435e5 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev) > } > > if (p->state < CXL_CONFIG_COMMIT) { > - dev_dbg(&cxlr->dev, "config state: %d\n", p->state); > - rc = -ENXIO; > + rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER, > + "region config state: %d\n", p->state); I would argue EPROBE_DEFER is not appropriate because there is no guarantee that the other members of the region show up, and if they do they will re-trigger probe. So "probe must be repeated until all endpoints were enumerated" is the case either way. I.e. either more endpoint arrival triggers re-probe or EPROBE_DEFER triggers extra redundant probing *and* still results in a probe attempts as endpoints arrive. So a dev_dbg() plus -ENXIO return on uncommited region state is expected. > goto out; > } > > -- > 2.39.5 > > I don't see an init order issue here as the mem module is always up > before the regions are probed. Right, cxl_endpoint_port_probe() triggers region discovery and cxl_endpoint_port_probe() currently only triggers after cxl_mem has registered an endpoint port. The failure this set is address is unwanted cxl_mem_probe() failures.