On Thu, 9 May 2024 17:26:46 -0700 Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > shiju.jose@ wrote: > > From: Shiju Jose <shiju.jose@xxxxxxxxxx> > > > > CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control > > feature. The device patrol scrub proactively locates and makes corrections > > to errors in regular cycle. > > > > Allow specifying the number of hours within which the patrol scrub must be > > completed, subject to minimum and maximum limits reported by the device. > > Also allow disabling scrub allowing trade-off error rates against > > performance. > > > > Register with scrub subsystem to provide scrub control attributes to the > > user. > > > > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > > Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx> > [..] > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c > > index 0c79d9ce877c..399e43463626 100644 > > --- a/drivers/cxl/mem.c > > +++ b/drivers/cxl/mem.c > > @@ -117,6 +117,12 @@ static int cxl_mem_probe(struct device *dev) > > if (!cxlds->media_ready) > > return -EBUSY; > > > > + rc = cxl_mem_patrol_scrub_init(cxlmd); > > + if (rc) { > > + dev_dbg(&cxlmd->dev, "CXL patrol scrub init failed\n"); > > + return rc; > > + } > > 2 concerns: > > * Why should cxl_mem_probe() fail just because this optional > scrub interface did not register? > Flip the dev_dbg to dev_warn() and indeed carry on. > * Why is this not located in cxl_region_probe()? If the ras2 scrub is an > HPA-based scrub I think CXL should do the work to interface with the scrub > interface at the same level. This also provides another in-kernel user > for all the DPA-to-HPA translation infrastructure that the CXL driver > contains. Pretty much the only reason the CXL driver needs to exist at > all is address translation, so at a minimum it seems a waste to inflict > more need to understand DPAs on userspace. As you might expect this will get messy - I'm not saying it's a bad thing to do, but complexities that come to mind include: * Scrub is device wide (unlike RAS2 which in theory supports HPA range control) So if you map a given DPA range into multiple regions then the controls will interfere. Maybe scrub at max rate requested for any region is fine. * Interleave - so we'd be controlling multiple hardware scrubbers. * Comes and goes with regions. Do we stop scrubbing if no region? Not sure. My guess is break down is: 1) Component registered for each CXL mem device to handle the control + combining of all regions specific requests. 2) Region specific component that exposes the controls on HPA basis, and requests from all it's CXL mem device drivers a minimum service level. 3) Device specific scrub instance (perhaps) reflecting that some scrub may make sense when not yet in a region (identify bad mem etc). So I think we will end up with a lot more layering in here, but end result will indeed be better. This has been going on a while, so not sure the DPA to HPA stuff was all in place and at the time I think was still an open question of whether that should be a userspace problem or not. Anyhow time to adapt :) Jonathan