RE: [PATCH v18 01/19] EDAC: Add support for EDAC device features control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>-----Original Message-----
>From: Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx>
>Sent: 13 January 2025 15:06
>To: Shiju Jose <shiju.jose@xxxxxxxxxx>
>Cc: linux-edac@xxxxxxxxxxxxxxx; linux-cxl@xxxxxxxxxxxxxxx; linux-
>acpi@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>bp@xxxxxxxxx; tony.luck@xxxxxxxxx; rafael@xxxxxxxxxx; lenb@xxxxxxxxxx;
>mchehab@xxxxxxxxxx; dan.j.williams@xxxxxxxxx; dave@xxxxxxxxxxxx; Jonathan
>Cameron <jonathan.cameron@xxxxxxxxxx>; dave.jiang@xxxxxxxxx;
>alison.schofield@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx;
>david@xxxxxxxxxx; Vilas.Sridharan@xxxxxxx; leo.duran@xxxxxxx;
>Yazen.Ghannam@xxxxxxx; rientjes@xxxxxxxxxx; jiaqiyan@xxxxxxxxxx;
>Jon.Grimm@xxxxxxx; dave.hansen@xxxxxxxxxxxxxxx;
>naoya.horiguchi@xxxxxxx; james.morse@xxxxxxx; jthoughton@xxxxxxxxxx;
>somasundaram.a@xxxxxxx; erdemaktas@xxxxxxxxxx; pgonda@xxxxxxxxxx;
>duenwen@xxxxxxxxxx; gthelen@xxxxxxxxxx;
>wschwartz@xxxxxxxxxxxxxxxxxxx; dferguson@xxxxxxxxxxxxxxxxxxx;
>wbs@xxxxxxxxxxxxxxxxxxxxxx; nifan.cxl@xxxxxxxxx; tanxiaofei
><tanxiaofei@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; Roberto
>Sassu <roberto.sassu@xxxxxxxxxx>; kangkang.shen@xxxxxxxxxxxxx;
>wanghuiqiang <wanghuiqiang@xxxxxxxxxx>; Linuxarm
><linuxarm@xxxxxxxxxx>
>Subject: Re: [PATCH v18 01/19] EDAC: Add support for EDAC device features
>control
>
>Em Mon, 6 Jan 2025 12:09:57 +0000
><shiju.jose@xxxxxxxxxx> escreveu:
>
>> From: Shiju Jose <shiju.jose@xxxxxxxxxx>
>>
>> Add generic EDAC device feature controls supporting the registration
>> of RAS features available in the system. The driver exposes control
>> attributes for these features to userspace in
>> /sys/bus/edac/devices/<dev-name>/<ras-feature>/
>>
>> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
>> Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx>
>> ---
>>  Documentation/edac/features.rst |  94 ++++++++++++++++++++++++++++++
>>  Documentation/edac/index.rst    |  10 ++++
>>  drivers/edac/edac_device.c      | 100 ++++++++++++++++++++++++++++++++
>>  include/linux/edac.h            |  28 +++++++++
>>  4 files changed, 232 insertions(+)
>>  create mode 100644 Documentation/edac/features.rst  create mode
>> 100644 Documentation/edac/index.rst
>>
>> diff --git a/Documentation/edac/features.rst
>> b/Documentation/edac/features.rst new file mode 100644 index
>> 000000000000..f32f259ce04d
>> --- /dev/null
>> +++ b/Documentation/edac/features.rst
>> @@ -0,0 +1,94 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>
>SPDX should match what's written there, e. g.
>
>	.. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-or-later
>
>Please notice that GNU FDL family contains both open source and non-open
>source licenses. The open-source one is this:
>
>	https://spdx.org/licenses/GFDL-1.2-no-invariants-or-later.html
>
>E.g. it is a the license permits changing the entire document in the future, as
>there's no invariant parts on it.
This seems not widely used, have seen this is used in few documents only. 

>
>> +
>> +============================================
>> +Augmenting EDAC for controlling RAS features
>> +============================================
>> +
>> +Copyright (c) 2024 HiSilicon Limited.
>
>2024-2025?
Will do.  
>
>> +
>> +:Author:   Shiju Jose <shiju.jose@xxxxxxxxxx>
>> +:License:  The GNU Free Documentation License, Version 1.2
>> +          (dual licensed under the GPL v2)
>
>You need to define if invariant parts are allowed or not, e. g.:
>
>	:License: The GNU Free Documentation License, Version 1.2 without
>Invariant Sections, Front-Cover Texts nor Back-Cover Texts.
>		  (dual licensed under the GPL v2)
Same as above.
>
>
>> +:Original Reviewers:
>> +
>> +- Written for: 6.14
>> +
>> +Introduction
>> +------------
>> +The expansion of EDAC for controlling RAS features and exposing
>> +features control attributes to userspace via sysfs. Some Examples:
>> +
>> +* Scrub control
>> +
>> +* Error Check Scrub (ECS) control
>> +
>> +* ACPI RAS2 features
>> +
>> +* Post Package Repair (PPR) control
>> +
>> +* Memory Sparing Repair control etc.
>> +
>> +High level design is illustrated in the following diagram::
>> +
>> +         _______________________________________________
>> +        |   Userspace - Rasdaemon                       |
>> +        |  _____________                                |
>> +        | | RAS CXL mem |      _______________          |
>> +        | |error handler|---->|               |         |
>> +        | |_____________|     | RAS dynamic   |         |
>> +        |  _____________      | scrub, memory |         |
>> +        | | RAS memory  |---->| repair control|         |
>> +        | |error handler|     |_______________|         |
>> +        | |_____________|          |                    |
>> +        |__________________________|____________________|
>> +                                   |
>> +                                   |
>> +
>_______________________________|______________________________
>> +   |     Kernel EDAC extension for | controlling RAS Features     |
>> +   | ______________________________|____________________________
>|
>> +   || EDAC Core          Sysfs EDAC| Bus                        | |
>> +   ||    __________________________|_________     _____________ | |
>> +   ||   |/sys/bus/edac/devices/<dev>/scrubX/ |   | EDAC device || |
>> +   ||   |/sys/bus/edac/devices/<dev>/ecsX/   |<->| EDAC MC     || |
>> +   ||   |/sys/bus/edac/devices/<dev>/repairX |   | EDAC sysfs  || |
>> +   ||   |____________________________________|   |_____________|| |
>> +   ||                           EDAC|Bus                        | |
>> +   ||                               |                           | |
>> +   ||    __________ Get feature     |      Get feature          | |
>> +   ||   |          |desc   _________|______ desc  __________    | |
>> +   ||   |EDAC scrub|<-----| EDAC device    |     |          |   | |
>> +   ||   |__________|      | driver- RAS    |---->| EDAC mem |   | |
>> +   ||    __________       | feature control|     | repair   |   | |
>> +   ||   |          |<-----|________________|     |__________|   | |
>> +   ||   |EDAC ECS  |    Register RAS|features                   | |
>> +   ||   |__________|                |                           | |
>> +   ||         ______________________|_____________              | |
>> +   ||_________|_______________|__________________|______________|
>|
>> +   |   _______|____    _______|_______       ____|__________      |
>> +   |  |            |  | CXL mem driver|     | Client driver |     |
>> +   |  | ACPI RAS2  |  | scrub, ECS,   |     | memory repair |     |
>> +   |  | driver     |  | sparing, PPR  |     | features      |     |
>> +   |  |____________|  |_______________|     |_______________|     |
>> +   |        |                 |                    |              |
>> +
>|________|_________________|____________________|______________|
>> +            |                 |                    |
>> +
>________|_________________|____________________|______________
>> +   |     ___|_________________|____________________|_______       |
>> +   |    |                                                  |      |
>> +   |    |            Platform HW and Firmware              |      |
>> +   |    |__________________________________________________|      |
>> +
>|______________________________________________________________|
>> +
>> +
>> +1. EDAC Features components - Create feature specific descriptors.
>> +For example, EDAC scrub, EDAC ECS, EDAC memory repair in the above
>> +diagram.
>> +
>> +2. EDAC device driver for controlling RAS Features - Get feature's
>> +attribute descriptors from EDAC RAS feature component and registers
>> +device's RAS features with EDAC bus and exposes the features control
>> +attributes via the sysfs EDAC bus. For example,
>> +/sys/bus/edac/devices/<dev-name>/<feature>X/
>> +
>> +3. RAS dynamic feature controller - Userspace sample modules in
>> +rasdaemon for dynamic scrub/repair control to issue scrubbing/repair
>> +when excess number of corrected memory errors are reported in a short
>span of time.
>> diff --git a/Documentation/edac/index.rst
>> b/Documentation/edac/index.rst new file mode 100644 index
>> 000000000000..b6c265a4cffb
>> --- /dev/null
>> +++ b/Documentation/edac/index.rst
>> @@ -0,0 +1,10 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +==============
>> +EDAC Subsystem
>> +==============
>> +
>> +.. toctree::
>> +   :maxdepth: 1
>> +
>> +   features
>> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
>> index 621dc2a5d034..9fce46dd7405 100644
>> --- a/drivers/edac/edac_device.c
>> +++ b/drivers/edac/edac_device.c
>> @@ -570,3 +570,103 @@ void edac_device_handle_ue_count(struct
>edac_device_ctl_info *edac_dev,
>>  		      block ? block->name : "N/A", count, msg);  }
>> EXPORT_SYMBOL_GPL(edac_device_handle_ue_count);
>> +
>> +static void edac_dev_release(struct device *dev) {
>> +	struct edac_dev_feat_ctx *ctx = container_of(dev, struct
>> +edac_dev_feat_ctx, dev);
>> +
>> +	kfree(ctx->dev.groups);
>> +	kfree(ctx);
>> +}
>> +
>> +const struct device_type edac_dev_type = {
>> +	.name = "edac_dev",
>> +	.release = edac_dev_release,
>> +};
>> +
>> +static void edac_dev_unreg(void *data) {
>> +	device_unregister(data);
>> +}
>> +
>> +/**
>> + * edac_dev_register - register device for RAS features with EDAC
>> + * @parent: parent device.
>> + * @name: parent device's name.
>> + * @private: parent driver's data to store in the context if any.
>> + * @num_features: number of RAS features to register.
>> + * @ras_features: list of RAS features to register.
>> + *
>> + * Return:
>> + *  * %0       - Success.
>> + *  * %-EINVAL - Invalid parameters passed.
>> + *  * %-ENOMEM - Dynamic memory allocation failed.
>> + *
>> + */
>> +int edac_dev_register(struct device *parent, char *name,
>> +		      void *private, int num_features,
>> +		      const struct edac_dev_feature *ras_features) {
>> +	const struct attribute_group **ras_attr_groups;
>> +	struct edac_dev_feat_ctx *ctx;
>> +	int attr_gcnt = 0;
>> +	int ret, feat;
>> +
>> +	if (!parent || !name || !num_features || !ras_features)
>> +		return -EINVAL;
>> +
>> +	/* Double parse to make space for attributes */
>> +	for (feat = 0; feat < num_features; feat++) {
>> +		switch (ras_features[feat].ft_type) {
>> +		/* Add feature specific code */
>> +		default:
>> +			return -EINVAL;
>> +		}
>> +	}
>> +
>> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
>> +	if (!ctx)
>> +		return -ENOMEM;
>> +
>> +	ras_attr_groups = kcalloc(attr_gcnt + 1, sizeof(*ras_attr_groups),
>GFP_KERNEL);
>> +	if (!ras_attr_groups) {
>> +		ret = -ENOMEM;
>> +		goto ctx_free;
>> +	}
>> +
>> +	attr_gcnt = 0;
>> +	for (feat = 0; feat < num_features; feat++, ras_features++) {
>> +		switch (ras_features->ft_type) {
>> +		/* Add feature specific code */
>> +		default:
>> +			ret = -EINVAL;
>> +			goto groups_free;
>> +		}
>> +	}
>> +
>> +	ctx->dev.parent = parent;
>> +	ctx->dev.bus = edac_get_sysfs_subsys();
>> +	ctx->dev.type = &edac_dev_type;
>> +	ctx->dev.groups = ras_attr_groups;
>> +	ctx->private = private;
>> +	dev_set_drvdata(&ctx->dev, ctx);
>> +
>> +	ret = dev_set_name(&ctx->dev, name);
>> +	if (ret)
>> +		goto groups_free;
>> +
>> +	ret = device_register(&ctx->dev);
>> +	if (ret) {
>> +		put_device(&ctx->dev);
>
>> +		return ret;
>
>As register failed, you need to change it to a goto groups_free, as
>edac_dev_release() won't be called.
As per experimentation edac_dev_release() will be called.

>
>> +	}
>> +
>> +	return devm_add_action_or_reset(parent, edac_dev_unreg, &ctx->dev);
>> +
>> +groups_free:
>> +	kfree(ras_attr_groups);
>> +ctx_free:
>> +	kfree(ctx);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(edac_dev_register);
>> diff --git a/include/linux/edac.h b/include/linux/edac.h index
>> b4ee8961e623..521b17113d4d 100644
>> --- a/include/linux/edac.h
>> +++ b/include/linux/edac.h
>> @@ -661,4 +661,32 @@ static inline struct dimm_info
>> *edac_get_dimm(struct mem_ctl_info *mci,
>>
>>  	return mci->dimms[index];
>>  }
>> +
>> +#define EDAC_FEAT_NAME_LEN	128
>
>This macro was not used on this patch.
Sure.
>
>> +
>> +/* RAS feature type */
>> +enum edac_dev_feat {
>> +	RAS_FEAT_MAX
>> +};
>> +
>> +/* EDAC device feature information structure */ struct edac_dev_data
>> +{
>> +	u8 instance;
>> +	void *private;
>> +};
>> +
>> +struct edac_dev_feat_ctx {
>> +	struct device dev;
>> +	void *private;
>> +};
>> +
>> +struct edac_dev_feature {
>> +	enum edac_dev_feat ft_type;
>> +	u8 instance;
>> +	void *ctx;
>> +};
>> +
>> +int edac_dev_register(struct device *parent, char *dev_name,
>> +		      void *parent_pvt_data, int num_features,
>> +		      const struct edac_dev_feature *ras_features);
>>  #endif /* _LINUX_EDAC_H_ */
>
>Thanks,
>Mauro

Thanks,
Shiju





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux