RE: [PATCH v13 02/18] EDAC: Add scrub control feature

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the feedbacks.

>-----Original Message-----
>From: Borislav Petkov <bp@xxxxxxxxx>
>Sent: 22 October 2024 20:05
>To: Shiju Jose <shiju.jose@xxxxxxxxxx>
>Cc: linux-edac@xxxxxxxxxxxxxxx; linux-cxl@xxxxxxxxxxxxxxx; linux-
>acpi@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>tony.luck@xxxxxxxxx; rafael@xxxxxxxxxx; lenb@xxxxxxxxxx;
>mchehab@xxxxxxxxxx; dan.j.williams@xxxxxxxxx; dave@xxxxxxxxxxxx; Jonathan
>Cameron <jonathan.cameron@xxxxxxxxxx>; dave.jiang@xxxxxxxxx;
>alison.schofield@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx;
>david@xxxxxxxxxx; Vilas.Sridharan@xxxxxxx; leo.duran@xxxxxxx;
>Yazen.Ghannam@xxxxxxx; rientjes@xxxxxxxxxx; jiaqiyan@xxxxxxxxxx;
>Jon.Grimm@xxxxxxx; dave.hansen@xxxxxxxxxxxxxxx;
>naoya.horiguchi@xxxxxxx; james.morse@xxxxxxx; jthoughton@xxxxxxxxxx;
>somasundaram.a@xxxxxxx; erdemaktas@xxxxxxxxxx; pgonda@xxxxxxxxxx;
>duenwen@xxxxxxxxxx; gthelen@xxxxxxxxxx;
>wschwartz@xxxxxxxxxxxxxxxxxxx; dferguson@xxxxxxxxxxxxxxxxxxx;
>wbs@xxxxxxxxxxxxxxxxxxxxxx; nifan.cxl@xxxxxxxxx; tanxiaofei
><tanxiaofei@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; Roberto
>Sassu <roberto.sassu@xxxxxxxxxx>; kangkang.shen@xxxxxxxxxxxxx;
>wanghuiqiang <wanghuiqiang@xxxxxxxxxx>; Linuxarm
><linuxarm@xxxxxxxxxx>
>Subject: Re: [PATCH v13 02/18] EDAC: Add scrub control feature
>
>On Wed, Oct 09, 2024 at 01:41:03PM +0100, shiju.jose@xxxxxxxxxx wrote:
>> diff --git a/Documentation/ABI/testing/sysfs-edac-scrub
>> b/Documentation/ABI/testing/sysfs-edac-scrub
>> new file mode 100644
>> index 000000000000..b4f8f0bba17b
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-edac-scrub
>> @@ -0,0 +1,69 @@
>> +What:		/sys/bus/edac/devices/<dev-name>/scrubX
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@xxxxxxxxxxxxxxx
>> +Description:
>> +		The sysfs EDAC bus devices /<dev-name>/scrubX subdirectory
>> +		belongs to an instance of memory scrub control feature,
>> +		where <dev-name> directory corresponds to a device/memory
>> +		region registered with the EDAC device driver for the
>> +		scrub control feature.
>> +		The sysfs scrub attr nodes would be present only if the
>> +		client driver has implemented the corresponding attr
>> +		callback function and passed in ops to the EDAC RAS feature
>> +		driver during registration.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrubX/addr_range_base
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@xxxxxxxxxxxxxxx
>> +Description:
>> +		(RW) The base of the address range of the memory region
>> +		to be scrubbed (on-demand scrubbing).
>
>Why does this sound more complicated than it is?
>
>Why isn't this simply "addr" and the next one "size"?
>
>On-demand scrubbing should scrub at address "addr" and "size" bytes. Can't get
>any simpler than that.
Sure. Will modify to addr and size.
>
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrubX/addr_range_size
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@xxxxxxxxxxxxxxx
>> +Description:
>> +		(RW) The size of the address range of the memory region
>> +		to be scrubbed (on-demand scrubbing).
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrubX/enable_background
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@xxxxxxxxxxxxxxx
>> +Description:
>> +		(RW) Start/Stop background(patrol) scrubbing if supported.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrubX/enable_on_demand
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@xxxxxxxxxxxxxxx
>> +Description:
>> +		(RW) Start/Stop on-demand scrubbing the memory region
>> +		if supported.
>
>Why do you need a separate "enable" flag?
>
>Why can't it be: "writing into "addr" starts the on-demand scrubbing"?
If  'enable' attribute is removed , then there is an ordering with setting address + size.
Also user space can't check whether scrubbing is enabled or not. 
>
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrubX/min_cycle_duration
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@xxxxxxxxxxxxxxx
>> +Description:
>> +		(RO) Supported minimum scrub cycle duration in seconds
>> +		by the memory scrubber.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrubX/max_cycle_duration
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@xxxxxxxxxxxxxxx
>> +Description:
>> +		(RO) Supported maximum scrub cycle duration in seconds
>> +		by the memory scrubber.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrubX/current_cycle_duration
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@xxxxxxxxxxxxxxx
>> +Description:
>> +		(RW) The current scrub cycle duration in seconds and must be
>> +		within the supported range by the memory scrubber.
>
>What are those three good for and why are they exposed?
Scrub has an overhead when running and that may want to be reduced by
just taking longer to do it. 
Min and max scrub cycle duration returns the range of scrub rate
supported by the device.
>
>> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index
>> 4edfb83ffbee..a96a74de8b9e 100644
>> --- a/drivers/edac/Makefile
>> +++ b/drivers/edac/Makefile
>> @@ -10,6 +10,7 @@ obj-$(CONFIG_EDAC)			:= edac_core.o
>>
>>  edac_core-y	:= edac_mc.o edac_device.o edac_mc_sysfs.o
>>  edac_core-y	+= edac_module.o edac_device_sysfs.o wq.o
>> +edac_core-y	+= scrub.o
>>
>>  edac_core-$(CONFIG_EDAC_DEBUG)		+= debugfs.o
>>
>> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
>> index 0b8aa8150239..0c9da55d18bc 100644
>> --- a/drivers/edac/edac_device.c
>> +++ b/drivers/edac/edac_device.c
>> @@ -576,6 +576,7 @@ static void edac_dev_release(struct device *dev)
>> {
>>  	struct edac_dev_feat_ctx *ctx = container_of(dev, struct
>> edac_dev_feat_ctx, dev);
>>
>> +	kfree(ctx->scrub);
>>  	kfree(ctx->dev.groups);
>>  	kfree(ctx);
>>  }
>> @@ -610,7 +611,9 @@ int edac_dev_register(struct device *parent, char
>*name,
>>  		      const struct edac_dev_feature *ras_features)  {
>>  	const struct attribute_group **ras_attr_groups;
>> +	struct edac_dev_data *dev_data;
>>  	struct edac_dev_feat_ctx *ctx;
>> +	int scrub_cnt = 0, scrub_inst = 0;
>>  	int attr_gcnt = 0;
>>  	int ret, feat;
>
>The EDAC tree preferred ordering of variable declarations at the beginning of a
>function is reverse fir tree order::
I used the reverse tree ordering, but missed here. I will correct. 
>
>	struct long_struct_name *descriptive_name;
>	unsigned long foo, bar;
>	unsigned int tmp;
>	int ret;
>
>The above is faster to parse than the reverse ordering::
>
>	int ret;
>	unsigned int tmp;
>	unsigned long foo, bar;
>	struct long_struct_name *descriptive_name;
>
>And even more so than random ordering::
>
>	unsigned long foo, bar;
>	int ret;
>	struct long_struct_name *descriptive_name;
>	unsigned int tmp;
>
>>
>> @@ -620,7 +623,10 @@ int edac_dev_register(struct device *parent, char
>*name,
>>  	/* Double parse to make space for attributes */
>>  	for (feat = 0; feat < num_features; feat++) {
>>  		switch (ras_features[feat].ft_type) {
>> -		/* Add feature specific code */
>> +		case RAS_FEAT_SCRUB:
>> +			attr_gcnt++;
>> +			scrub_cnt++;
>> +			break;
>>  		default:
>>  			return -EINVAL;
>>  		}
>> @@ -639,13 +645,36 @@ int edac_dev_register(struct device *parent, char
>*name,
>>  		goto ctx_free;
>>  	}
>>
>> +	if (scrub_cnt) {
>> +		ctx->scrub = kcalloc(scrub_cnt, sizeof(*ctx->scrub),
>GFP_KERNEL);
>> +		if (!ctx->scrub) {
>> +			ret = -ENOMEM;
>> +			goto groups_free;
>> +		}
>> +	}
>> +
>>  	attr_gcnt = 0;
>>  	for (feat = 0; feat < num_features; feat++, ras_features++) {
>>  		switch (ras_features->ft_type) {
>> -		/* Add feature specific code */
>> +		case RAS_FEAT_SCRUB:
>> +			if (!ras_features->scrub_ops)
>> +				continue;
>
>Continue?
>
>I think fail. What is a scrub feature good for if it doesn't have ops?
Here continue to check any other feature (for eg. ECS, memory repair or another scrub instance) listed
by the parent device in the ras_features[].   
>
>> +			if (scrub_inst != ras_features->instance)
>> +				goto data_mem_free;
>> +			dev_data = &ctx->scrub[scrub_inst];
>> +			dev_data->instance = scrub_inst;
>> +			dev_data->scrub_ops = ras_features->scrub_ops;
>> +			dev_data->private = ras_features->ctx;
>> +			ret = edac_scrub_get_desc(parent,
>&ras_attr_groups[attr_gcnt],
>> +						  ras_features->instance);
>> +			if (ret)
>> +				goto data_mem_free;
>> +			scrub_inst++;
>> +			attr_gcnt++;
>> +			break;
>>  		default:
>>  			ret = -EINVAL;
>> -			goto groups_free;
>> +			goto data_mem_free;
>>  		}
>>  	}
>>
>
>...
>
>--
>Regards/Gruss,
>    Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette

Thanks,
Shiju





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux