Jonathan Cameron wrote: > Ok. Best path is drop the available range support then (so no min_ max_ or > anything to replace them for now). I think less is more in this case. The hpa, dpa, nibble, column, channel, bank, rank, row... ABI looks too wide for userspace to have a chance at writing a competent tool. At least I am struggling with where to even begin with those ABIs if I was asked to write a tool. Does a tool already exist for those? Some questions that read on those ABIs are: 1/ What if the platform has translation between HPA (CXL decode) and SPA (physical addresses reported in trace points that PIO and DMA see)? 2/ What if memory is interleaved across repair domains? 3/ What if the device does not use DDR terminology / topology terms for repair? I expect the flow rasdaemon would want is that the current PFA (leaky bucket Pre-Failure Analysis) decides that the number of soft-offlines it has performed exceeds some threshold and it wants to attempt to repair memory. However, what is missing today for volatile memory is that some failures can be repaired with in-band writes and some failures need heavier hammers like Post-Package-Repair to actively swap in whole new banks of memory. So don't we need something like "soft-offline-undo" on the way to PPR? So, yes, +1 to simpler for now where software effectively just needs to deal with a handful of "region repair" buttons and the semantics of those are coarse and sub-optimal. Wait for a future where a tool author says, "we have had good success getting bulk offlined pages back into service, but now we need this specific finer grained kernel interface to avoid wasting spare banks prematurely". Anything more complex than a set of /sys/devices/system/memory/ devices has a /sys/bus/edac/devices/devX/repair button, feels like a generation ahead of where the initial sophistication needs to lie. That said, I do not closely follow ras tooling to say whether someone has already identified the critical need for a fine grained repair ABI?