Thanks for the comments. >-----Original Message----- >From: Borislav Petkov <bp@xxxxxxxxx> >Sent: 28 October 2024 11:17 >To: Shiju Jose <shiju.jose@xxxxxxxxxx> >Cc: linux-edac@xxxxxxxxxxxxxxx; linux-cxl@xxxxxxxxxxxxxxx; linux- >acpi@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; >tony.luck@xxxxxxxxx; rafael@xxxxxxxxxx; lenb@xxxxxxxxxx; >mchehab@xxxxxxxxxx; dan.j.williams@xxxxxxxxx; dave@xxxxxxxxxxxx; Jonathan >Cameron <jonathan.cameron@xxxxxxxxxx>; gregkh@xxxxxxxxxxxxxxxxxxx; >sudeep.holla@xxxxxxx; jassisinghbrar@xxxxxxxxx; dave.jiang@xxxxxxxxx; >alison.schofield@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx; >david@xxxxxxxxxx; Vilas.Sridharan@xxxxxxx; leo.duran@xxxxxxx; >Yazen.Ghannam@xxxxxxx; rientjes@xxxxxxxxxx; jiaqiyan@xxxxxxxxxx; >Jon.Grimm@xxxxxxx; dave.hansen@xxxxxxxxxxxxxxx; >naoya.horiguchi@xxxxxxx; james.morse@xxxxxxx; jthoughton@xxxxxxxxxx; >somasundaram.a@xxxxxxx; erdemaktas@xxxxxxxxxx; pgonda@xxxxxxxxxx; >duenwen@xxxxxxxxxx; gthelen@xxxxxxxxxx; >wschwartz@xxxxxxxxxxxxxxxxxxx; dferguson@xxxxxxxxxxxxxxxxxxx; >wbs@xxxxxxxxxxxxxxxxxxxxxx; nifan.cxl@xxxxxxxxx; tanxiaofei ><tanxiaofei@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; Roberto >Sassu <roberto.sassu@xxxxxxxxxx>; kangkang.shen@xxxxxxxxxxxxx; >wanghuiqiang <wanghuiqiang@xxxxxxxxxx>; Linuxarm ><linuxarm@xxxxxxxxxx> >Subject: Re: [PATCH v14 03/14] EDAC: Add ECS control feature > >On Fri, Oct 25, 2024 at 06:13:44PM +0100, shiju.jose@xxxxxxxxxx wrote: >> +What: /sys/bus/edac/devices/<dev- >name>/ecs_fruX/log_entry_type >> +Date: Jan 2025 >> +KernelVersion: 6.13 >> +Contact: linux-edac@xxxxxxxxxxxxxxx >> +Description: >> + (RW) The log entry type of how the DDR5 ECS log is reported. >> + 00b - per DRAM. >> + 01b - per memory media FRU. > >If the conversion function here is kstrtoul(), why are those values not "0" >and "1" but in binary format? Followed the description for the " Log Entry Type " in the CXL spec rev 3.1 Table 8-210. DDR5 ECS Control Feature Readable Attributes and Table 8-211. DDR5 ECS Control Feature Writable Attributes. This was written as " Common DDR5 ECS Log Capabilities " though actually returns the status of the log_entry_type. Common DDR5 ECS Log Capabilities • Bits[1:0]: Log Entry Type: The log entry type of how the ECS log is reported. The entry type is defined commonly for all memory media FRUs within the device. — 00b = Per DRAM — 01b = Per Memory Media FRU — All other encodings are reserved • Bits[7:2]: Reserved. I will update to 0 and 1. > >> + >> +What: /sys/bus/edac/devices/<dev- >name>/ecs_fruX/log_entry_type_per_dram >> +Date: Jan 2025 >> +KernelVersion: 6.13 >> +Contact: linux-edac@xxxxxxxxxxxxxxx >> +Description: >> + (RO) True if current log entry type is per DRAM. >> + >> +What: /sys/bus/edac/devices/<dev- >name>/ecs_fruX/log_entry_type_per_memory_media >> +Date: Jan 2025 >> +KernelVersion: 6.13 >> +Contact: linux-edac@xxxxxxxxxxxxxxx >> +Description: >> + (RO) True if current log entry type is per memory media FRU. > >What's the point of those two if log_entry_type already gives you the same info? > This was written as " Common DDR5 ECS Log Capabilities " in the CXL spec rev 3.1 Table 8-210. DDR5 ECS Control Feature Readable Attributes, though actually returns the status of the log_entry_type. I will remove these extra log type attributes. >And the filename length is a bit too much... > >> + >> +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/mode >> +Date: Jan 2025 >> +KernelVersion: 6.13 >> +Contact: linux-edac@xxxxxxxxxxxxxxx >> +Description: >> + (RW) The mode of how the DDR5 ECS counts the errors. >> + 0 - ECS counts rows with errors. >> + 1 - ECS counts codewords with errors. > >Now we have "0" and "1"s. Oh well. > >What are "rows", what are "codewords"? Explain them here pls for the user. Will do. > >> +What: /sys/bus/edac/devices/<dev- >name>/ecs_fruX/mode_counts_rows >> +Date: Jan 2025 >> +KernelVersion: 6.13 >> +Contact: linux-edac@xxxxxxxxxxxxxxx >> +Description: >> + (RO) True if current mode is ECS counts rows with errors. >> + >> +What: /sys/bus/edac/devices/<dev- >name>/ecs_fruX/mode_counts_codewords >> +Date: Jan 2025 >> +KernelVersion: 6.13 >> +Contact: linux-edac@xxxxxxxxxxxxxxx >> +Description: >> + (RO) True if current mode is ECS counts codewords with errors. > >Same question as above - redundant files. I will remove redundant files. > >> +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/reset >> +Date: Jan 2025 >> +KernelVersion: 6.13 >> +Contact: linux-edac@xxxxxxxxxxxxxxx >> +Description: >> + (WO) ECS reset ECC counter. >> + 1 - reset ECC counter to the default value. > >1 or any value? > >Looks like any to me... > >You should restrict it to "1" in case you want to extend this interface with "2" in >the future, for example, doing something a bit different. I will update. Returns error for any other value set than '1'. > >> + >> +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/threshold >> +Date: Jan 2025 >> +KernelVersion: 6.13 >> +Contact: linux-edac@xxxxxxxxxxxxxxx >> +Description: >> + (RW) ECS threshold count per gigabits of memory cells. > >That definitely needs more explanation. Sure. > >> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index >> 188501e676c7..b24c2c112d9c 100644 >> --- a/drivers/edac/Makefile >> +++ b/drivers/edac/Makefile >> @@ -10,7 +10,7 @@ obj-$(CONFIG_EDAC) := edac_core.o >> >> edac_core-y := edac_mc.o edac_device.o edac_mc_sysfs.o >> edac_core-y += edac_module.o edac_device_sysfs.o wq.o >> -edac_core-y += scrub.o >> +edac_core-y += scrub.o ecs.o >> >> edac_core-$(CONFIG_EDAC_DEBUG) += debugfs.o >> >> diff --git a/drivers/edac/ecs.c b/drivers/edac/ecs.c new file mode >> 100755 index 000000000000..a2b64d7bf6b6 >> --- /dev/null >> +++ b/drivers/edac/ecs.c >> @@ -0,0 +1,240 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> +/* >> + * Generic ECS driver in order to support control the on die >> + * error check scrub (e.g. DDR5 ECS). > >This sentence needs grammar check. Will do. > >> The common sysfs ECS >> + * interface abstracts the control of an arbitrary ECS >> + * functionality to a common set of functions. >> + * >> + * Copyright (c) 2024 HiSilicon Limited. >> + */ >> + > >#undef pr_fmt > >> +#define pr_fmt(fmt) "EDAC ECS: " fmt > >Grep the tree for examples how to do that properly. > >Also, this pr_fmt looks unused. I will remove. > >> +static umode_t ecs_attr_visible(struct kobject *kobj, struct >> +attribute *a, int attr_id) { >> + struct device *ras_feat_dev = kobj_to_dev(kobj); >> + struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev); >> + const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops; >> + >> + switch (attr_id) { >> + case ECS_LOG_ENTRY_TYPE: >> + if (ops->get_log_entry_type) { >> + if (ops->set_log_entry_type) >> + return a->mode; >> + else >> + return 0444; > >What is the goal for the access mode of all those sysfs entries? I sure hope it is >going to be root-only no-matter what. I don't want normal users to cause scrub >activity. Please make sure your whole set does that. This is the attribute_group's is_visible() callback for controlling the visibility of the attributes based on whether attribute is added or not by the parent driver. Yes. Writable only by the root. > >> + } >> + break; >> + case ECS_LOG_ENTRY_TYPE_PER_DRAM: >> + if (ops->get_log_entry_type_per_dram) >> + return a->mode; >> + break; [...] > EDAC_ECS_ATTR_RO(log_entry_type_per_dram, fru); >> + fru_ctx- >>ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA] = >> + > EDAC_ECS_ATTR_RO(log_entry_type_per_memory_media, fru); >> + fru_ctx->ecs_dev_attr[ECS_MODE] = >EDAC_ECS_ATTR_RW(mode, fru); >> + fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_ROWS] = >> + > EDAC_ECS_ATTR_RO(mode_counts_rows, fru); >> + fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_CODEWORDS] = >> + > EDAC_ECS_ATTR_RO(mode_counts_codewords, fru); >> + fru_ctx->ecs_dev_attr[ECS_RESET] = >EDAC_ECS_ATTR_WO(reset, fru); >> + fru_ctx->ecs_dev_attr[ECS_THRESHOLD] = >EDAC_ECS_ATTR_RW(threshold, >> +fru); > >Clearly too long variable and define names. Shorten pls. Will do > >Also, a new line here: Ok. > ><--- > > [...] >> + case RAS_FEAT_ECS: >> + if (!ras_features->ecs_ops) >> + goto data_mem_free; > ><---- newline here. Wil do. > >> + dev_data = &ctx->ecs; >> + dev_data->ecs_ops = ras_features->ecs_ops; >> + dev_data->private = ras_features->ctx; >> + ret = edac_ecs_get_desc(parent, >&ras_attr_groups[attr_gcnt], >> + ras_features- >>ecs_info.num_media_frus); >> + if (ret) >> + goto data_mem_free; > >Ditto. Wil do. > >> + attr_gcnt += ras_features->ecs_info.num_media_frus; >> + break; >> default: >> ret = -EINVAL; >> goto data_mem_free; > >-- >Regards/Gruss, > Boris. Thanks, Shiju