Re: [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature

Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> · Fri, 10 May 2024 12:23:25 +0100

On Thu, 9 May 2024 17:26:46 -0700
Dan Williams <dan.j.williams@xxxxxxxxx> wrote:

> shiju.jose@ wrote:
> > From: Shiju Jose <shiju.jose@xxxxxxxxxx>
> > 
> > CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control
> > feature. The device patrol scrub proactively locates and makes corrections
> > to errors in regular cycle.
> > 
> > Allow specifying the number of hours within which the patrol scrub must be
> > completed, subject to minimum and maximum limits reported by the device.
> > Also allow disabling scrub allowing trade-off error rates against
> > performance.
> > 
> > Register with scrub subsystem to provide scrub control attributes to the
> > user.
> > 
> > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx>  
> [..]
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > index 0c79d9ce877c..399e43463626 100644
> > --- a/drivers/cxl/mem.c
> > +++ b/drivers/cxl/mem.c
> > @@ -117,6 +117,12 @@ static int cxl_mem_probe(struct device *dev)
> >  	if (!cxlds->media_ready)
> >  		return -EBUSY;
> >  
> > +	rc = cxl_mem_patrol_scrub_init(cxlmd);
> > +	if (rc) {
> > +		dev_dbg(&cxlmd->dev, "CXL patrol scrub init failed\n");
> > +		return rc;
> > +	}  
> 
> 2 concerns:
> 
> * Why should cxl_mem_probe() fail just because this optional
>   scrub interface did not register?
> 

Flip the dev_dbg to dev_warn() and indeed carry on.

> * Why is this not located in cxl_region_probe()? If the ras2 scrub is an
>   HPA-based scrub I think CXL should do the work to interface with the scrub
>   interface at the same level. This also provides another in-kernel user
>   for all the DPA-to-HPA translation infrastructure that the CXL driver
>   contains. Pretty much the only reason the CXL driver needs to exist at
>   all is address translation, so at a minimum it seems a waste to inflict
>   more need to understand DPAs on userspace.

As you might expect this will get messy - I'm not saying it's a bad thing
to do, but complexities that come to mind include:

* Scrub is device wide (unlike RAS2 which in theory supports HPA range control)
  So if you map a given DPA range into multiple regions then the controls
  will interfere.  Maybe scrub at max rate requested for any region is fine.
* Interleave - so we'd be controlling multiple hardware scrubbers.
* Comes and goes with regions.  Do we stop scrubbing if no region?  Not sure.

My guess is break down is:
1) Component registered for each CXL mem device to handle the control + combining
   of all regions specific requests.
2) Region specific component that exposes the controls on HPA basis, and
   requests from all it's CXL mem device drivers a minimum service level.
3) Device specific scrub instance (perhaps) reflecting that some scrub may
   make sense when not yet in a region (identify bad mem etc).

So I think we will end up with a lot more layering in here, but end result
will indeed be better.

This has been going on a while, so not sure the DPA to HPA stuff was all in place
and at the time I think was still an open question of whether that should be
a userspace problem or not.  Anyhow time to adapt :)

Jonathan