Re: [PATCH v18 04/19] EDAC: Add memory repair control feature

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 14 Jan 2025 13:38:31 +0100
Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> wrote:

> Em Thu, 9 Jan 2025 14:24:33 +0000
> Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> escreveu:
> 
> > On Thu, 9 Jan 2025 13:32:22 +0100
> > Borislav Petkov <bp@xxxxxxxxx> wrote:
> > 
> > Hi Boris,
> >   
> > > On Thu, Jan 09, 2025 at 11:00:43AM +0000, Shiju Jose wrote:    
> > > > The min_ and max_ attributes of the control attributes are added  for your
> > > > feedback on V15 to expose supported ranges of these control attributes to the user, 
> > > > in the following links.        
> > > 
> > > Sure, but you can make that differently:
> > > 
> > > cat /sys/bus/edac/devices/<dev-name>/mem_repairX/bank
> > > [x:y]
> > > 
> > > which is the allowed range.    
> > 
> > To my thinking that would fail the test of being an intuitive interface.
> > To issue a repair command requires that multiple attributes be configured
> > before triggering the actual repair.
> > 
> > Think of it as setting the coordinates of the repair in a high dimensional
> > space.
> > 
> > In the extreme case of fine grained repair (Cacheline), to identify the
> > relevant subunit of memory (obtained from the error record that we are
> > basing the decision to repair on) we need to specify all of:
> > 
> > Channel, sub-channel, rank,  bank group, row, column and nibble mask.
> > For coarser granularity repair only specify a subset of these applies and
> > only the relevant controls are exposed to userspace.
> > 
> > They are broken out as specific attributes to enable each to be set before
> > triggering the action with a write to the repair attribute.
> > 
> > There are several possible alternatives:
> > 
> > Option 1
> > 
> > "A:B:C:D:E:F:G:H:I:J" opaque single write to trigger the repair where
> > each number is providing one of those coordinates and where a readback
> > let's us known what each number is.
> > 
> > That single attribute interface is very hard to extend in an intuitive way.
> > 
> > History tell us more levels will be introduced in the middle, not just
> > at the finest granularity, making such an interface hard to extend in
> > a backwards compatible way.
> > 
> > Another alternative of a key value list would make for a nasty sysfs
> > interface.
> > 
> > Option 2 
> > There are sysfs interfaces that use a selection type presentation.
> > 
> > Write: "C", Read: "A, B, [C], D" but that only works well for discrete sets
> > of options and is a pain to parse if read back is necessary.  
> 
> Writing it as:
> 
> 	a b [c] d
> 
> or even:
> 	a, b, [c], d
> 
> doesn't make it hard to be parse on userspace. Adding a comma makes
> Kernel code a little bigger, as it needs an extra check at the loop
> to check if the line is empty or not:
> 
> 	if (*tmp != '\0')
> 		*tmp += snprintf(", ")
> 
> Btwm we have an implementation like that on kernelspace/userspace for
> the RC API:
> 
> - Kernelspace:
>   https://github.com/torvalds/linux/blob/master/drivers/media/rc/rc-main.c#L1125
>   6 lines of code + a const table with names/values, if we use the same example
>   for EDAC:
> 
> 	const char *name[] = { "foo", "bar" };
> 
> 	for (i = 0; i < ARRAY_SIZE(names); i++) {
> 		if (enabled & names[i].type)
> 			tmp += sprintf(tmp, "[%s] ", names[i].name);
> 		else if (allowed & proto_names[i].type)
> 			tmp += sprintf(tmp, "%s ", names[i].name);
> 	}
> 
> 
> - Userspace:
>   https://git.linuxtv.org/v4l-utils.git/tree/utils/keytable/keytable.c#n197
>   5 lines of code + a const table, if we use the same example
>   for ras-daemon:
> 
> 		const char *name[] = { 
> 			[EDAC_FOO] = "[foo]",
> 			[EDAC_BAR] = "[bar]",
> 		};
> 
> 		for (p = strtok(arg, " ,"); p; p = strtok(NULL, " ,"))
> 			for (i = 0; i < ARRAY_SIZE(name); i++)
> 				if (!strcasecmp(p, name[i])
> 					return i;
> 		return -1;
> 
> 	(strtok handles both space and commas at the above example)
> 
> IMO, this is a lot better, as the alternative would be to have separate
> sysfs nodes to describe what values are valid for a given edac devnode.
> 
> See, userspace needs to know what values are valid for a given
> device and support for it may vary depending on the Kernel and
> device version. So, we need to have the information about what values
> are valid stored on some sysfs devnode, to allow backward compatibility.

These aren't selectors from a discrete list so the question is more
whether a syntax of
<min> value <max> 
is intuitive or not.  I'm not aware of precedence for this one.

There was another branch of the thread where Boris mentioned this as an
option. It isn't bad to deal with and an easy change to the code,
but I have an open question on what choice we make for representing
unknown min / max.  For separate files the absence of the file
indicates we don't have any information.


> 
> > 
> > So in conclusion, I think the proposed multiple sysfs attribute style
> > with them reading back the most recent value written is the least bad
> > solution to a complex control interface.
> >   
> > > 
> > > echo ... 
> > > 
> > > then writes in the bank.
> > >     
> > > > ... so we would propose we do not add max_ and min_ for now and see how the
> > > > use cases evolve.      
> > > 
> > > Yes, you should apply that same methodology to the rest of the new features
> > > you're adding: only add functionality for the stuff that is actually being
> > > used now. You can always extend it later.
> > > 
> > > Changing an already user-visible API is a whole different story and a lot lot
> > > harder, even impossible.
> > > 
> > > So I'd suggest you prune the EDAC patches from all the hypothetical usage and
> > > then send only what remains so that I can try to queue them.    
> > 
> > Sure. In this case the addition of min/max was perhaps a wrong response to
> > your request for a way to those ranges rather than just rejecting a write
> > of something out of range as earlier version did.
> > 
> > We can revisit in future if range discovery becomes necessary.  Personally
> > I don't think it is given we are only taking these actions in response error
> > records that give us precisely what to write and hence are always in range.  
> 
> For RO devnodes, there's no need for ranges, but those are likely needed for
> RW, as otherwise userspace may try to write invalid requests and/or have
> backward-compatibility issues.

Given these parameters are only meaningfully written with values coming
ultimately from error records, userspace should never consider writing
something that is out of range except during testing.

I don't mind presenting the range where known (in CXL case it is not
discoverable for most of them) but I wouldn't expect tooling to ever
read it as known correct values to write come from the error records.
Checking those values against provided limits seems an unnecessary step
given an invalid parameter that slips through will be rejected by the
hardware anyway.

Jonathan

> 
> 
> Thanks,
> Mauro





[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux