On Wed, 9 Oct 2024 13:41:16 +0100 <shiju.jose@xxxxxxxxxx> wrote: > From: Shiju Jose <shiju.jose@xxxxxxxxxx> > > Add generic EDAC memory repair control, eg. PPR(Post Package Repair), > memory sparing etc, control driver in order to control memory repairs > in the system. Supports sPPR(soft PPR), hPPR(hard PPR), soft/hard memory > sparing, memory sparing at cacheline/row/bank/rank granularity etc. > Device with memory repair features registers with EDAC device driver, > which retrieves memory repair descriptor from EDAC memory repair driver and > exposes the sysfs repair control attributes to userspace in > /sys/bus/edac/devices/<dev-name>/mem_repairX/. > > The common memory repair control interface abstracts the control of an > arbitrary memory repair functionality to a common set of functions. > The sysfs memory repair attribute nodes would be present only if the client > driver has implemented the corresponding attribute callback function and > passed in ops to the EDAC device driver during registration. > > Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx> The question inline that we discussed offlist. Whether it makes sense to potentially have one device provide several mem_repairX differing in granularity (and may type) of repair, or one mem_repairX that has a control over granularity? The CXL spec has it designed as separate control interfaces but I'm not sure if we should follow that precedence or not. > --- > .../ABI/testing/sysfs-edac-mem-repair | 152 +++++++++ > drivers/edac/Makefile | 2 +- > drivers/edac/edac_device.c | 31 ++ > drivers/edac/mem_repair.c | 317 ++++++++++++++++++ > include/linux/edac.h | 67 ++++ > 5 files changed, 568 insertions(+), 1 deletion(-) > create mode 100644 Documentation/ABI/testing/sysfs-edac-mem-repair > create mode 100755 drivers/edac/mem_repair.c > > diff --git a/Documentation/ABI/testing/sysfs-edac-mem-repair b/Documentation/ABI/testing/sysfs-edac-mem-repair > new file mode 100644 > index 000000000000..9a8712ed9d47 > --- /dev/null > +++ b/Documentation/ABI/testing/sysfs-edac-mem-repair > @@ -0,0 +1,152 @@ > +What: /sys/bus/edac/devices/<dev-name>/mem_repairX > +Date: Oct 2024 > +KernelVersion: 6.12 > +Contact: linux-edac@xxxxxxxxxxxxxxx > +Description: > + The sysfs EDAC bus devices /<dev-name>/mem_repairX subdirectory > + belongs to the memory media repair features control, such as > + PPR (Post Package Repair), memory sparing etc, where<dev-name> > + directory corresponds to a device registered with the EDAC > + device driver for the memory repair features. > + /mem_repairX belongs to either sPPR (Soft PPR) or hPPR (Hard PPR) > + feature of PPR feature, hard or soft memory sparing etc. The memory > + sparing is a repair function that replaces a portion of memory > + (spared memory) with a portion of functional memory. The memory > + sparing has cacheline/row/bank/rank sparing granularities. > + The sysfs memory repair attr nodes would be only present if a > + memory repair feature is supported. > + > +What: /sys/bus/edac/devices/<dev-name>/mem_repairX/repair_type > +Date: Oct 2024 > +KernelVersion: 6.12 > +Contact: linux-edac@xxxxxxxxxxxxxxx > +Description: > + (RO) Type of the repair instance. For eg. sPPR, hPPR, cacheline/ > + row/bank/rank memory sparing etc. So this is the open question for me with this feature. Do we do a monolithic 'device' that does all repair types for which we pick a mode or do we (as here) allow for one mem_repairX for each supported type? I don't particularly mind but it is a design question I'd like input on from a wider audience. > + > +What: /sys/bus/edac/devices/<dev-name>/mem_repairX/hpa > +Date: Oct 2024 > +KernelVersion: 6.12 > +Contact: linux-edac@xxxxxxxxxxxxxxx > +Description: > + (WO) Set HPA (Host Physical Address) for memory repair. Can we not just read back what was written? Seems like userspace might expect that?