On 10/30/24 9:16 AM, Jonathan Cameron wrote: > On Tue, 29 Oct 2024 11:32:47 -0700 > Dave Jiang <dave.jiang@xxxxxxxxx> wrote: > >> On 10/29/24 10:00 AM, Shiju Jose wrote: >>> >>> >>>> -----Original Message----- >>>> From: Dave Jiang <dave.jiang@xxxxxxxxx> >>>> Sent: 29 October 2024 16:32 >>>> To: Shiju Jose <shiju.jose@xxxxxxxxxx>; linux-edac@xxxxxxxxxxxxxxx; linux- >>>> cxl@xxxxxxxxxxxxxxx; linux-acpi@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux- >>>> kernel@xxxxxxxxxxxxxxx >>>> Cc: bp@xxxxxxxxx; tony.luck@xxxxxxxxx; rafael@xxxxxxxxxx; lenb@xxxxxxxxxx; >>>> mchehab@xxxxxxxxxx; dan.j.williams@xxxxxxxxx; dave@xxxxxxxxxxxx; Jonathan >>>> Cameron <jonathan.cameron@xxxxxxxxxx>; gregkh@xxxxxxxxxxxxxxxxxxx; >>>> sudeep.holla@xxxxxxx; jassisinghbrar@xxxxxxxxx; alison.schofield@xxxxxxxxx; >>>> vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx; david@xxxxxxxxxx; >>>> Vilas.Sridharan@xxxxxxx; leo.duran@xxxxxxx; Yazen.Ghannam@xxxxxxx; >>>> rientjes@xxxxxxxxxx; jiaqiyan@xxxxxxxxxx; Jon.Grimm@xxxxxxx; >>>> dave.hansen@xxxxxxxxxxxxxxx; naoya.horiguchi@xxxxxxx; >>>> james.morse@xxxxxxx; jthoughton@xxxxxxxxxx; somasundaram.a@xxxxxxx; >>>> erdemaktas@xxxxxxxxxx; pgonda@xxxxxxxxxx; duenwen@xxxxxxxxxx; >>>> gthelen@xxxxxxxxxx; wschwartz@xxxxxxxxxxxxxxxxxxx; >>>> dferguson@xxxxxxxxxxxxxxxxxxx; wbs@xxxxxxxxxxxxxxxxxxxxxx; >>>> nifan.cxl@xxxxxxxxx; tanxiaofei <tanxiaofei@xxxxxxxxxx>; Zengtao (B) >>>> <prime.zeng@xxxxxxxxxxxxx>; Roberto Sassu <roberto.sassu@xxxxxxxxxx>; >>>> kangkang.shen@xxxxxxxxxxxxx; wanghuiqiang <wanghuiqiang@xxxxxxxxxx>; >>>> Linuxarm <linuxarm@xxxxxxxxxx> >>>> Subject: Re: [PATCH v14 07/14] cxl/memfeature: Add CXL memory device patrol >>>> scrub control feature >>>> >>>> >>>> >>>> On 10/25/24 10:13 AM, shiju.jose@xxxxxxxxxx wrote: >>>>> From: Shiju Jose <shiju.jose@xxxxxxxxxx> >>>>> >>>>> CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub >>>>> control feature. The device patrol scrub proactively locates and makes >>>>> corrections to errors in regular cycle. >>>>> >>>>> Allow specifying the number of hours within which the patrol scrub >>>>> must be completed, subject to minimum and maximum limits reported by the >>>> device. >>>>> Also allow disabling scrub allowing trade-off error rates against >>>>> performance. >>>>> >>>>> Add support for patrol scrub control on CXL memory devices. >>>>> Register with the EDAC device driver, which retrieves the scrub >>>>> attribute descriptors from EDAC scrub and exposes the sysfs scrub >>>>> control attributes to userspace. For example, scrub control for the >>>>> CXL memory device "cxl_mem0" is exposed in >>>> /sys/bus/edac/devices/cxl_mem0/scrubX/. >>>>> >>>>> Additionally, add support for region-based CXL memory patrol scrub control. >>>>> CXL memory regions may be interleaved across one or more CXL memory >>>>> devices. For example, region-based scrub control for "cxl_region1" is >>>>> exposed in /sys/bus/edac/devices/cxl_region1/scrubX/. >>>>> >>>>> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> >>>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> >>>>> Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx> >>>>> --- >>>>> Documentation/edac/edac-scrub.rst | 74 ++++++ >>>>> drivers/cxl/Kconfig | 18 ++ >>>>> drivers/cxl/core/Makefile | 1 + >>>>> drivers/cxl/core/memfeature.c | 381 ++++++++++++++++++++++++++++++ >>>>> drivers/cxl/core/region.c | 6 + >>>>> drivers/cxl/cxlmem.h | 7 + >>>>> drivers/cxl/mem.c | 4 + >>>>> 7 files changed, 491 insertions(+) >>>>> create mode 100644 Documentation/edac/edac-scrub.rst create mode >>>>> 100644 drivers/cxl/core/memfeature.c >>>>> >>>>> diff --git a/Documentation/edac/edac-scrub.rst >>>>> b/Documentation/edac/edac-scrub.rst >>>>> new file mode 100644 >>>>> index 000000000000..4aad4974b208 >>>>> --- /dev/null >>>>> +++ b/Documentation/edac/edac-scrub.rst >>>>> @@ -0,0 +1,74 @@ >>>>> +.. SPDX-License-Identifier: GPL-2.0 >>>>> + >>> [...] >>> >>>>> +static int cxl_mem_ps_get_attrs(struct cxl_memdev_state *mds, >>>>> + struct cxl_memdev_ps_params *params) { >>>>> + size_t rd_data_size = sizeof(struct cxl_memdev_ps_rd_attrs); >>>>> + size_t data_size; >>>>> + struct cxl_memdev_ps_rd_attrs *rd_attrs __free(kfree) = >>>>> + kmalloc(rd_data_size, >>>> GFP_KERNEL); >>>>> + if (!rd_attrs) >>>>> + return -ENOMEM; >>>>> + >>>>> + data_size = cxl_get_feature(mds, cxl_patrol_scrub_uuid, >>>>> + CXL_GET_FEAT_SEL_CURRENT_VALUE, >>>>> + rd_attrs, rd_data_size); >>>>> + if (!data_size) >>>>> + return -EIO; >>>>> + >>>>> + params->scrub_cycle_changeable = >>>> FIELD_GET(CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK, >>>>> + rd_attrs->scrub_cycle_cap); >>>>> + params->enable = >>>> FIELD_GET(CXL_MEMDEV_PS_FLAG_ENABLED_MASK, >>>>> + rd_attrs->scrub_flags); >>>>> + params->scrub_cycle_hrs = >>>> FIELD_GET(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK, >>>>> + rd_attrs->scrub_cycle_hrs); >>>>> + params->min_scrub_cycle_hrs = >>>> FIELD_GET(CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK, >>>>> + rd_attrs->scrub_cycle_hrs); >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>>>> +static int cxl_ps_get_attrs(struct device *dev, void *drv_data, >>>> >>>> Would a union be better than a void *drv_data for all the places this is used as a >>>> parameter? How many variations of this are there? >>>> >>>> DJ >>> Hi Dave, >>> >>> Can you give more info on this given this is a generic callback for the scrub control and each >>> implementation will have its own context struct (for eg. struct cxl_patrol_scrub_context here >>> for CXL scrub control), which in turn will be passed in and out as opaque data. >> >> Mainly I'm just seeing a lot of calls with (void *). Just asking if we want to make it a union that contains 'struct cxl_patrol_scrub_context' and etc. > > You could but then every new driver would need to include > changes in the edac core to add it's own entry to that union. > > Not sure that's a good way to go for opaque driver specific context. > > This particular function though can use > a struct cxl_patrol_scrub_context * anyway as it's not part of the > core interface, but rather one called only indirectly > by functions that are passed a void * but know it is a > struct clx_patrol_scrub_context *. Thanks Jonathan. That's basically what I wanted to know. > > Jonathan > > >> >>> >>> Thanks, >>> Shiju >>>> >>>>> + struct cxl_memdev_ps_params *params) { >>>>> + struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data; >>>>> + struct cxl_memdev *cxlmd; >>>>> + struct cxl_dev_state *cxlds; >>>>> + struct cxl_memdev_state *mds; >>>>> + u16 min_scrub_cycle = 0; >>>>> + int i, ret; >>>>> + >>>>> + if (cxl_ps_ctx->cxlr) { >>>>> + struct cxl_region *cxlr = cxl_ps_ctx->cxlr; >>>>> + struct cxl_region_params *p = &cxlr->params; >>>>> + >>>>> + for (i = p->interleave_ways - 1; i >= 0; i--) { >>>>> + struct cxl_endpoint_decoder *cxled = p->targets[i]; >>>>> + >>>>> + cxlmd = cxled_to_memdev(cxled); >>>>> + cxlds = cxlmd->cxlds; >>>>> + mds = to_cxl_memdev_state(cxlds); >>>>> + ret = cxl_mem_ps_get_attrs(mds, params); >>>>> + if (ret) >>>>> + return ret; >>>>> + >>>>> + if (params->min_scrub_cycle_hrs > min_scrub_cycle) >>>>> + min_scrub_cycle = params- >>>>> min_scrub_cycle_hrs; >>>>> + } >>>>> + params->min_scrub_cycle_hrs = min_scrub_cycle; >>>>> + return 0; >>>>> + } >>>>> + cxlmd = cxl_ps_ctx->cxlmd; >>>>> + cxlds = cxlmd->cxlds; >>>>> + mds = to_cxl_memdev_state(cxlds); >>>>> + >>>>> + return cxl_mem_ps_get_attrs(mds, params); } >>>>> + >>> [...] >>>> >>> >> >> > >