On Fri, Oct 25, 2024 at 06:13:44PM +0100, shiju.jose@xxxxxxxxxx wrote: > +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/log_entry_type > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@xxxxxxxxxxxxxxx > +Description: > + (RW) The log entry type of how the DDR5 ECS log is reported. > + 00b - per DRAM. > + 01b - per memory media FRU. If the conversion function here is kstrtoul(), why are those values not "0" and "1" but in binary format? > + > +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/log_entry_type_per_dram > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@xxxxxxxxxxxxxxx > +Description: > + (RO) True if current log entry type is per DRAM. > + > +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/log_entry_type_per_memory_media > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@xxxxxxxxxxxxxxx > +Description: > + (RO) True if current log entry type is per memory media FRU. What's the point of those two if log_entry_type already gives you the same info? And the filename length is a bit too much... > + > +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/mode > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@xxxxxxxxxxxxxxx > +Description: > + (RW) The mode of how the DDR5 ECS counts the errors. > + 0 - ECS counts rows with errors. > + 1 - ECS counts codewords with errors. Now we have "0" and "1"s. Oh well. What are "rows", what are "codewords"? Explain them here pls for the user. > +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/mode_counts_rows > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@xxxxxxxxxxxxxxx > +Description: > + (RO) True if current mode is ECS counts rows with errors. > + > +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/mode_counts_codewords > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@xxxxxxxxxxxxxxx > +Description: > + (RO) True if current mode is ECS counts codewords with errors. Same question as above - redundant files. > +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/reset > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@xxxxxxxxxxxxxxx > +Description: > + (WO) ECS reset ECC counter. > + 1 - reset ECC counter to the default value. 1 or any value? Looks like any to me... You should restrict it to "1" in case you want to extend this interface with "2" in the future, for example, doing something a bit different. > + > +What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/threshold > +Date: Jan 2025 > +KernelVersion: 6.13 > +Contact: linux-edac@xxxxxxxxxxxxxxx > +Description: > + (RW) ECS threshold count per gigabits of memory cells. That definitely needs more explanation. > diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile > index 188501e676c7..b24c2c112d9c 100644 > --- a/drivers/edac/Makefile > +++ b/drivers/edac/Makefile > @@ -10,7 +10,7 @@ obj-$(CONFIG_EDAC) := edac_core.o > > edac_core-y := edac_mc.o edac_device.o edac_mc_sysfs.o > edac_core-y += edac_module.o edac_device_sysfs.o wq.o > -edac_core-y += scrub.o > +edac_core-y += scrub.o ecs.o > > edac_core-$(CONFIG_EDAC_DEBUG) += debugfs.o > > diff --git a/drivers/edac/ecs.c b/drivers/edac/ecs.c > new file mode 100755 > index 000000000000..a2b64d7bf6b6 > --- /dev/null > +++ b/drivers/edac/ecs.c > @@ -0,0 +1,240 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Generic ECS driver in order to support control the on die > + * error check scrub (e.g. DDR5 ECS). This sentence needs grammar check. > The common sysfs ECS > + * interface abstracts the control of an arbitrary ECS > + * functionality to a common set of functions. > + * > + * Copyright (c) 2024 HiSilicon Limited. > + */ > + #undef pr_fmt > +#define pr_fmt(fmt) "EDAC ECS: " fmt Grep the tree for examples how to do that properly. Also, this pr_fmt looks unused. > +static umode_t ecs_attr_visible(struct kobject *kobj, struct attribute *a, int attr_id) > +{ > + struct device *ras_feat_dev = kobj_to_dev(kobj); > + struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev); > + const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops; > + > + switch (attr_id) { > + case ECS_LOG_ENTRY_TYPE: > + if (ops->get_log_entry_type) { > + if (ops->set_log_entry_type) > + return a->mode; > + else > + return 0444; What is the goal for the access mode of all those sysfs entries? I sure hope it is going to be root-only no-matter what. I don't want normal users to cause scrub activity. Please make sure your whole set does that. > + } > + break; > + case ECS_LOG_ENTRY_TYPE_PER_DRAM: > + if (ops->get_log_entry_type_per_dram) > + return a->mode; > + break; > + case ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA: > + if (ops->get_log_entry_type_per_memory_media) > + return a->mode; > + break; > + case ECS_MODE: > + if (ops->get_mode) { > + if (ops->set_mode) > + return a->mode; > + else > + return 0444; > + } > + break; > + case ECS_MODE_COUNTS_ROWS: > + if (ops->get_mode_counts_rows) > + return a->mode; > + break; > + case ECS_MODE_COUNTS_CODEWORDS: > + if (ops->get_mode_counts_codewords) > + return a->mode; > + break; > + case ECS_RESET: > + if (ops->reset) > + return a->mode; > + break; > + case ECS_THRESHOLD: > + if (ops->get_threshold) { > + if (ops->set_threshold) > + return a->mode; > + else > + return 0444; > + } > + break; > + default: > + break; > + } > + > + return 0; > +} > + > +#define EDAC_ECS_ATTR_RO(_name, _fru_id) \ > + ((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RO(_name), \ > + .fru_id = _fru_id }) > + > +#define EDAC_ECS_ATTR_WO(_name, _fru_id) \ > + ((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_WO(_name), \ > + .fru_id = _fru_id }) > + > +#define EDAC_ECS_ATTR_RW(_name, _fru_id) \ > + ((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RW(_name), \ > + .fru_id = _fru_id }) > + > +static int ecs_create_desc(struct device *ecs_dev, > + const struct attribute_group **attr_groups, u16 num_media_frus) > +{ > + struct edac_ecs_context *ecs_ctx; > + u32 fru; > + > + ecs_ctx = devm_kzalloc(ecs_dev, sizeof(*ecs_ctx), GFP_KERNEL); > + if (!ecs_ctx) > + return -ENOMEM; > + > + ecs_ctx->num_media_frus = num_media_frus; > + ecs_ctx->fru_ctxs = devm_kcalloc(ecs_dev, num_media_frus, > + sizeof(*ecs_ctx->fru_ctxs), > + GFP_KERNEL); > + if (!ecs_ctx->fru_ctxs) > + return -ENOMEM; > + > + for (fru = 0; fru < num_media_frus; fru++) { > + struct edac_ecs_fru_context *fru_ctx = &ecs_ctx->fru_ctxs[fru]; > + struct attribute_group *group = &fru_ctx->group; > + int i; > + > + fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE] = EDAC_ECS_ATTR_RW(log_entry_type, fru); > + fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_DRAM] = > + EDAC_ECS_ATTR_RO(log_entry_type_per_dram, fru); > + fru_ctx->ecs_dev_attr[ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA] = > + EDAC_ECS_ATTR_RO(log_entry_type_per_memory_media, fru); > + fru_ctx->ecs_dev_attr[ECS_MODE] = EDAC_ECS_ATTR_RW(mode, fru); > + fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_ROWS] = > + EDAC_ECS_ATTR_RO(mode_counts_rows, fru); > + fru_ctx->ecs_dev_attr[ECS_MODE_COUNTS_CODEWORDS] = > + EDAC_ECS_ATTR_RO(mode_counts_codewords, fru); > + fru_ctx->ecs_dev_attr[ECS_RESET] = EDAC_ECS_ATTR_WO(reset, fru); > + fru_ctx->ecs_dev_attr[ECS_THRESHOLD] = EDAC_ECS_ATTR_RW(threshold, fru); Clearly too long variable and define names. Shorten pls. Also, a new line here: <--- > + for (i = 0; i < ECS_MAX_ATTRS; i++) > + fru_ctx->ecs_attrs[i] = &fru_ctx->ecs_dev_attr[i].dev_attr.attr; > + > + sprintf(fru_ctx->name, "%s%d", EDAC_ECS_FRU_NAME, fru); > + group->name = fru_ctx->name; > + group->attrs = fru_ctx->ecs_attrs; > + group->is_visible = ecs_attr_visible; > + > + attr_groups[fru] = group; > + } > + > + return 0; > +} > + > +/** > + * edac_ecs_get_desc - get EDAC ECS descriptors > + * @ecs_dev: client device, supports ECS feature > + * @attr_groups: pointer to attribute group container > + * @num_media_frus: number of media FRUs in the device > + * > + * Return: > + * * %0 - Success. > + * * %-EINVAL - Invalid parameters passed. > + * * %-ENOMEM - Dynamic memory allocation failed. > + */ > +int edac_ecs_get_desc(struct device *ecs_dev, > + const struct attribute_group **attr_groups, u16 num_media_frus) > +{ > + if (!ecs_dev || !attr_groups || !num_media_frus) > + return -EINVAL; > + > + return ecs_create_desc(ecs_dev, attr_groups, num_media_frus); > +} > diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c > index 91552271b34a..5fc3ec7f25eb 100644 > --- a/drivers/edac/edac_device.c > +++ b/drivers/edac/edac_device.c > @@ -626,6 +626,9 @@ int edac_dev_register(struct device *parent, char *name, > attr_gcnt++; > scrub_cnt++; > break; > + case RAS_FEAT_ECS: > + attr_gcnt += ras_features[feat].ecs_info.num_media_frus; > + break; > default: > return -EINVAL; > } > @@ -667,6 +670,18 @@ int edac_dev_register(struct device *parent, char *name, > scrub_inst++; > attr_gcnt++; > break; > + case RAS_FEAT_ECS: > + if (!ras_features->ecs_ops) > + goto data_mem_free; <---- newline here. > + dev_data = &ctx->ecs; > + dev_data->ecs_ops = ras_features->ecs_ops; > + dev_data->private = ras_features->ctx; > + ret = edac_ecs_get_desc(parent, &ras_attr_groups[attr_gcnt], > + ras_features->ecs_info.num_media_frus); > + if (ret) > + goto data_mem_free; Ditto. > + attr_gcnt += ras_features->ecs_info.num_media_frus; > + break; > default: > ret = -EINVAL; > goto data_mem_free; -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette