On Thu, Feb 06, 2025 at 01:39:49PM +0000, Jonathan Cameron wrote: > Shiju is just finalizing a v19 + the userspace code. So may make > sense to read this reply only after that is out! Saw them. So, from a cursory view, all that sysfs marshalling that happens in patch 1 and 2 here: https://lore.kernel.org/r/20250207143028.1865-1-shiju.jose@xxxxxxxxxx is not really needed, AFAICT. You can basically check CXL_EVENT_RECORD_FLAG_MAINT_NEEDED *in the kernel* and go and start the recovery action. rasdaemon is basically logging the error record and parroting it back into sysfs which is completely unnecessary - the kernel can simply do that. Patches 3 and 4 are probably more of a justification for the userspace interaction as the kernel driver is "not ready" to do recovery for <raisins>. But there I'm also questioning the presence of the sysfs interface - the error record could simply be injected raw and the kernel can pick it apart. Or maybe there's a point for rasdaemon to ponder over all those different attributes and maybe involve some non-trivial massaging of error info in order to come at some conclusion and inject that as a recovery action. I guess I'm missing something and maybe there really is a valid use case to expose all those attributes through sysfs and use them. But I don't see a clear reason now... > For this comment I was referring letting the kernel do the > stats gathering etc. We would need to put back records from a previous boot. > That requires almost the same interface as just telling it to repair. > Note the address to physical memory mapping is not stable across boots > so we can't just provide a physical address, we need full description. Right. > Ah. No not that. I was just meaning the case where it is hard PPR. (hence > persistent for all time) Once you've done it you can't go back so after > N uses, any more errors mean you need a new device ASAP. That is as decision > with a very different threshold to soft PPR where it's a case of you > do it until you run out of spares, then you fall back to offlining > pages. Next boot you get your spares back again and may use them > differently this time. Ok. > True enough. I'm not against doing things in kernel in some cases. Even > then I want the controls to allow user space to do more complex things. > Even in the cases where the devices suggests repair, we may not want to for > reasons that device can't know about. Sure, as long as supporting such a use case is important enough to warrant supporting a user interface indefinitely. All I'm saying is, it better be worth the effort. > The interface provides all the data, and all the controls to match. > > Sure, something new might come along that needs additional controls (subchannel > for DDR5 showed up recently for instance and are in v19) but that extension > should be easy and fit within the ABI. Those new 'features' will need > kernel changes and matching rasdaemon changes anyway as there is new data > in the error records so this sort of extension should be fine. As long as you don't break existing usage, you're good. The moment you have to change how rasdaemon uses the interface with a new rasdaemon, then you need to support both. > Agreed. We need an interface we can support indefinitely - there is nothing > different between doing it sysfs or debugfs. That should be > extensible in a clean fashion to support new data and matching control. > > We don't have to guarantee that interface supports something 'new' though > as our crystal balls aren't perfect, but we do want to make extending to > cover the new straight forward. Right. > If a vendor wants to do their own thing then good luck to them but don't expect > the standard software stack to work. So far I have seen no sign of anyone > doing a non compliant memory expansion device and there are quite a > few spec compliant ones. Nowadays hw vendors use a lot of Linux to verify hw so catching an unsupported device early is good. But there's always a case... > > We will get weird memory devices with accelerators perhaps but then that > memory won't be treated as normal memory anyway and likely has a custom > RAS solution. If they do use the spec defined commands, then this > support should work fine. Just needs a call from their drive to hook > it up. > > It might not be the best analogy, but I think of the CXL type 3 device > spec as being similar to NVME. There are lots of options, but most people > will run one standard driver. There may be custom features but the > device better be compatible with the NVME driver if they advertise > the class code (there are compliance suites etc) Ack. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette