Re: [PATCH v2 0/6] cxl: Initialization and shutdown fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Robert Richter wrote:
> On 22.10.24 18:43:15, Dan Williams wrote:
> > Changes since v1 [1]:
> > - Fix some misspellings missed by checkpatch in changelogs (Jonathan)
> > - Add comments explaining the order of objects in drivers/cxl/Makefile
> >   (Jonathan)
> > - Rename attach_device => cxl_rescan_attach (Jonathan)
> > - Fixup Zijun's email (Zijun)
> > 
> > [1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@xxxxxxxxxxxxxxxxxxxxxxxxx
> > 
> > ---
> > 
> > Original cover:
> > 
> > Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
> > delayed arrival of the CXL "root" infrastructure [1] prompted questions
> > of how the existing mechanism for retrying cxl_mem_probe() could be
> > failing.
> 
> I found a similar issue with the region creation. 
> 
> A region is created with the first endpoint found and immediately
> added as device which triggers cxl_region_probe(). Now, in
> interleaving setups the region state comes into commit state only
> after the last endpoint was probed. So the probe must be repeated
> until all endpoints were enumerated. I ended up with this change:
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index a07b62254596..c78704e435e5 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev)
>  	}
>  
>  	if (p->state < CXL_CONFIG_COMMIT) {
> -		dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
> -		rc = -ENXIO;
> +		rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
> +				"region config state: %d\n", p->state);

I would argue EPROBE_DEFER is not appropriate because there is no
guarantee that the other members of the region show up, and if they do
they will re-trigger probe. So "probe must be repeated until all
endpoints were enumerated" is the case either way. I.e. either more
endpoint arrival triggers re-probe or EPROBE_DEFER triggers extra
redundant probing *and* still results in a probe attempts as endpoints
arrive.

So a dev_dbg() plus -ENXIO return on uncommited region state is
expected.

>  		goto out;
>  	}
>  
> -- 
> 2.39.5
> 
> I don't see an init order issue here as the mem module is always up
> before the regions are probed.

Right, cxl_endpoint_port_probe() triggers region discovery and
cxl_endpoint_port_probe() currently only triggers after cxl_mem has
registered an endpoint port.

The failure this set is address is unwanted cxl_mem_probe() failures.




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux