On Thu, 15 Feb 2024 19:14:50 +0800 <shiju.jose@xxxxxxxxxx> wrote: > From: Shiju Jose <shiju.jose@xxxxxxxxxx> > > Register with the scrub configure driver to expose the sysfs attributes > to the user for configuring the CXL memory device's ECS feature. > Add the static CXL ECS specific attributes to support configuring the > CXL memory device ECS feature. > > Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx> The ABI in here needs documentation. My key takeaway is that it is very ECS specific. I think one of the big challenges of a common scrub control system is going to be trying to come up with some meaningful common ABI. > --- > drivers/cxl/core/memscrub.c | 253 +++++++++++++++++++++++++++++++++++- > 1 file changed, 250 insertions(+), 3 deletions(-) > > diff --git a/drivers/cxl/core/memscrub.c b/drivers/cxl/core/memscrub.c > index a1fb40f8307f..325084b22e7a 100644 > --- a/drivers/cxl/core/memscrub.c > +++ b/drivers/cxl/core/memscrub.c > @@ -464,6 +464,8 @@ EXPORT_SYMBOL_NS_GPL(cxl_mem_patrol_scrub_init, CXL); > #define CXL_MEMDEV_ECS_GET_FEAT_VERSION 0x01 > #define CXL_MEMDEV_ECS_SET_FEAT_VERSION 0x01 > > +#define CXL_DDR5_ECS "cxl_ecs" I would just put these name defines inline. > +enum cxl_mem_ecs_scrub_attributes { > + cxl_ecs_log_entry_type, > + cxl_ecs_log_entry_type_per_dram, > + cxl_ecs_log_entry_type_per_memory_media, > + cxl_ecs_mode, > + cxl_ecs_mode_counts_codewords, > + cxl_ecs_mode_counts_rows, > + cxl_ecs_reset, > + cxl_ecs_threshold, > + cxl_ecs_threshold_available, > + cxl_ecs_max_attrs, This is pretty much all custom ABI. Challenging to make it common with the main scrub and RASF controls, but I think we do need to see if we can come up with something that is at least vaguely consistent across different forms of scrub control. What the user cares about is how likely an error is to get past the scrubbing that is running (I think - RAS folk speak up if I have this wrong!) So how do we go from the ECS parameters to that sort of info? I think ECS is effectively scrubbing at a fixed rate (google suggests all ram every 24 hours). We are really controlling what info is reported rather than what scrub is carried out. Useful stuff to potentially control but different from the other cases. > +}; > + > int cxl_mem_ecs_init(struct cxl_memdev *cxlmd, int region_id) > { > + char scrub_name[CXL_MEMDEV_MAX_NAME_LENGTH]; > struct cxl_mbox_supp_feat_entry feat_entry; > struct cxl_ecs_context *cxl_ecs_ctx; > + struct device *cxl_scrub_dev; Make this more local as we don't need it out here? > int nmedia_frus; > int ret; > > @@ -755,6 +993,15 @@ int cxl_mem_ecs_init(struct cxl_memdev *cxlmd, int region_id) > cxl_ecs_ctx->get_feat_size = feat_entry.get_feat_size; > cxl_ecs_ctx->set_feat_size = feat_entry.set_feat_size; > cxl_ecs_ctx->region_id = region_id; > + > + snprintf(scrub_name, sizeof(scrub_name), "%s_%s_region%d", > + CXL_DDR5_ECS, dev_name(&cxlmd->dev), cxl_ecs_ctx->region_id); > + cxl_scrub_dev = devm_scrub_device_register(&cxlmd->dev, scrub_name, > + cxl_ecs_ctx, NULL, > + cxl_ecs_ctx->region_id, > + &cxl_mem_ecs_attr_group); > + if (IS_ERR(cxl_scrub_dev)) > + return PTR_ERR(cxl_scrub_dev); > } > > return 0;