On Fri, 17 May 2024 12:15:54 +0100 Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote: > Focusing on just one bit. > > > > Now, the question of how many legacy scrub interfaces should be > > > considered in this design out of the gate is a worthwhile discussion. I > > > am encouraged that this ABI is at least trying to handle more than 1 > > > backend, which makes me feel better that adding a 3rd and 4th might not > > > be prohibitive. > > > > See above. > > > > I'm perfectly fine with: "hey, we have a new scrub API interfacing to > > RAS scrub capability and it is *the* thing to use and all other hw scrub > > functionality should be shoehorned into it. > > > > So this thing's design should at least try to anticipate supporting > > other scrub hw. > > > > Because there's EDAC too. Why isn't this scrub thing part of EDAC? Why > > isn't this scrub API part of edac_core? I mean, this is all RAS so why > > design a whole new thing when the required glue is already there? > > > > We can just as well have a > > > > /sys/devices/system/edac/scrub/ > > > > node hierarchy and have everything there. Sorry - finger fumble, wasn't meant to send yet :( > > A few questions about this. It seems an unusual use of fake devices and a bus > so I'm trying to understand how we might do something that looks more standard > but perhaps also fit within the existing scheme. I appreciate this stuff > has evolved over a long time, so lots of backwards compatibility concerns. > > If I follow this right the current situation is: > > /sys/devices/system/edac is the 'virtual' device registered on the edac bus. Actually that's wrong it's not on the edac bus as that is the bus registered via subsys_system_register() (which does create a fake device as per the docs telling us not to use it any more - fair enough, legacy). The mc below it is a bare device - I think just to provide a directory? The comment on the release function seems to say that. This gives. /sys/devices/system/edac/mc /sys/bus/edac/devices/mc Under that we have individual mc0/mc1 etc for the instances of that accessible via /sys/devices/system/edac/mc/mc0 /sys/bus/edac/device/mc/mc0 Those are registered a children of mc. I'd have expected them to be children of the device that registered them - so for our case, a CXL mc0 node would be child of the CXL device rather than here but again I'm guessing legacy that had to be maintained. In general this nesting seems unusual, as I'd have expected the registration directly on the edac bus with /sys/bus/edac/device/mc0 /sys/bus/edac/device/pci0 Given we are talking about something new, maybe this is an opportunity to not perpetuate this? If we add scrub in here I'd prefer to just use the normal bus registration handling rather than creating a nest of additional nodes. So perhaps we could consider /sys/bus/edac/device/scrub0 (or whatever name makes sense, as per the earlier discussion of cxl_scrub0 or similar). Could consider moving the bus location of mc0 etc in future to there with symlinks to /sys/bus/edac/device/mc/* for backwards compatibility either via setting their parents or more explicit link creation. These scrub0 would have their dev->parent set to who ever actually registered them providing that reference cleanly and letting all the normal device model stuff work more simply. If we did that with the scrub nodes, the only substantial change from a separate subsystem as seen in this patch set would be to register them on the edac bus rather than a separate class. As you pointed out, there is a simple scrub interface in the existing edac memory controller code. How would you suggest handling that? Have them all register an additional device on the bus (as a child of the mcX devices) perhaps? Seems an easy step forwards and should be no backwards compatibility concerns. > > > > > Why does it have to be yet another thing? It absolutely doesn't as long as we can do it fairly cleanly within existing code. I wasn't sure that was possible, but you know edac a lot better than me and so I'll defer to you on that! > > > > And if it needs to be separate, who's going to maintain it? Several options for that, but fair question - bringing (at least some of) the RAS mess together will focus reviewer bandwidth etc better. > > > > > Which matches what I reacted to on the last posting: > > > > > > "Maybe it is self evident to others, but for me there is little in these > > > changelogs besides 'mechanism exists, enable it'" > > > > > > ...and to me that feedback was taken to heart with much improved > > > changelogs in this new posting. > > > > Ok. > > > > > This init time feature probing discussion feels like it was born from a > > > micommunication / misunderstanding. > > > > Yes, it seems so, thanks for clarifying things. > > > > I still am unclear on the usecases and how this is supposed to be used > > and also, as mentioned above, we have a *lot* of RAS functionality > > spread around the kernel. Perhaps we should start unifying it instead of > > adding more... I'm definitely keen on unifying things as I agree, this mixture of different RAS functionality is a ever worsening mess. Jonathan > > > > So the big picture and where we're headed to, needs to be clarified first. > > > > Thx. > > >