Alistair Popple <apopple@xxxxxxxxxx> writes: > "Huang, Ying" <ying.huang@xxxxxxxxx> writes: > >> Hi, Alistair, >> >> Thanks a lot for comments! >> >> Alistair Popple <apopple@xxxxxxxxxx> writes: >> >>> Huang Ying <ying.huang@xxxxxxxxx> writes: >>> >>>> The abstract distance may be calculated by various drivers, such as >>>> ACPI HMAT, CXL CDAT, etc. While it may be used by various code which >>>> hot-add memory node, such as dax/kmem etc. To decouple the algorithm >>>> users and the providers, the abstract distance calculation algorithms >>>> management mechanism is implemented in this patch. It provides >>>> interface for the providers to register the implementation, and >>>> interface for the users. >>> >>> I wonder if we need this level of decoupling though? It seems to me like >>> it would be simpler and better for drivers to calculate the abstract >>> distance directly themselves by calling the desired algorithm (eg. ACPI >>> HMAT) and pass this when creating the nodes rather than having a >>> notifier chain. >> >> Per my understanding, ACPI HMAT and memory device drivers (such as >> dax/kmem) may belong to different subsystems (ACPI vs. dax). It's not >> good to call functions across subsystems directly. So, I think it's >> better to use a general subsystem: memory-tier.c to decouple them. If >> it turns out that a notifier chain is unnecessary, we can use some >> function pointers instead. >> >>> At the moment it seems we've only identified two possible algorithms >>> (ACPI HMAT and CXL CDAT) and I don't think it would make sense for one >>> of those to fallback to the other based on priority, so why not just >>> have drivers call the correct algorithm directly? >> >> For example, we have a system with PMEM (persistent memory, Optane >> DCPMM, or AEP, or something else) in DIMM slots and CXL.mem connected >> via CXL link to a remote memory pool. We will need ACPI HMAT for PMEM >> and CXL CDAT for CXL.mem. One way is to make dax/kmem identify the >> types of the device and call corresponding algorithms. > > Yes, that is what I was thinking. > >> The other way (suggested by this series) is to make dax/kmem call a >> notifier chain, then CXL CDAT or ACPI HMAT can identify the type of >> device and calculate the distance if the type is correct for them. I >> don't think that it's good to make dax/kem to know every possible >> types of memory devices. > > Do we expect there to be lots of different types of memory devices > sharing a common dax/kmem driver though? Must admit I'm coming from a > GPU background where we'd expect each type of device to have it's own > driver anyway so wasn't expecting different types of memory devices to > be handled by the same driver. Now, dax/kmem.c is used for - PMEM (Optane DCPMM, or AEP) - CXL.mem - HBM (attached to CPU) I understand that for a CXL GPU driver it's OK to call some CXL CDAT helper to identify the abstract distance of memory attached to GPU. Because there's no cross-subsystem function calls. But it looks not very clean to call from dax/kmem.c to CXL CDAT because it's a cross-subsystem function call. >>>> Multiple algorithm implementations can cooperate via calculating >>>> abstract distance for different memory nodes. The preference of >>>> algorithm implementations can be specified via >>>> priority (notifier_block.priority). >>> >>> How/what decides the priority though? That seems like something better >>> decided by a device driver than the algorithm driver IMHO. >> >> Do we need the memory device driver specific priority? Or we just share >> a common priority? For example, the priority of CXL CDAT is always >> higher than that of ACPI HMAT? Or architecture specific? > > Ok, thanks. Having read the above I think the priority is > unimportant. Algorithms can either decide to return a distance and > NOTIFY_STOP_MASK if they can calculate a distance or NOTIFY_DONE if they > can't for a specific device. Yes. In most cases, there are no overlaps among algorithms. >> And, I don't think that we are forced to use the general notifier >> chain interface in all memory device drivers. If the memory device >> driver has better understanding of the memory device, it can use other >> way to determine abstract distance. For example, a CXL memory device >> driver can identify abstract distance by itself. While other memory >> device drivers can use the general notifier chain interface at the >> same time. > > Whilst I think personally I would find that flexibility useful I am > concerned it means every driver will just end up divining it's own > distance rather than ensuring data in HMAT/CDAT/etc. is correct. That > would kind of defeat the purpose of it all then. But we have no way to enforce that too. -- Best Regards, Huang, Ying