"Huang, Ying" <ying.huang@xxxxxxxxx> writes: > Hi, Alistair, > > Thanks a lot for comments! > > Alistair Popple <apopple@xxxxxxxxxx> writes: > >> Huang Ying <ying.huang@xxxxxxxxx> writes: >> >>> The abstract distance may be calculated by various drivers, such as >>> ACPI HMAT, CXL CDAT, etc. While it may be used by various code which >>> hot-add memory node, such as dax/kmem etc. To decouple the algorithm >>> users and the providers, the abstract distance calculation algorithms >>> management mechanism is implemented in this patch. It provides >>> interface for the providers to register the implementation, and >>> interface for the users. >> >> I wonder if we need this level of decoupling though? It seems to me like >> it would be simpler and better for drivers to calculate the abstract >> distance directly themselves by calling the desired algorithm (eg. ACPI >> HMAT) and pass this when creating the nodes rather than having a >> notifier chain. > > Per my understanding, ACPI HMAT and memory device drivers (such as > dax/kmem) may belong to different subsystems (ACPI vs. dax). It's not > good to call functions across subsystems directly. So, I think it's > better to use a general subsystem: memory-tier.c to decouple them. If > it turns out that a notifier chain is unnecessary, we can use some > function pointers instead. > >> At the moment it seems we've only identified two possible algorithms >> (ACPI HMAT and CXL CDAT) and I don't think it would make sense for one >> of those to fallback to the other based on priority, so why not just >> have drivers call the correct algorithm directly? > > For example, we have a system with PMEM (persistent memory, Optane > DCPMM, or AEP, or something else) in DIMM slots and CXL.mem connected > via CXL link to a remote memory pool. We will need ACPI HMAT for PMEM > and CXL CDAT for CXL.mem. One way is to make dax/kmem identify the > types of the device and call corresponding algorithms. Yes, that is what I was thinking. > The other way (suggested by this series) is to make dax/kmem call a > notifier chain, then CXL CDAT or ACPI HMAT can identify the type of > device and calculate the distance if the type is correct for them. I > don't think that it's good to make dax/kem to know every possible > types of memory devices. Do we expect there to be lots of different types of memory devices sharing a common dax/kmem driver though? Must admit I'm coming from a GPU background where we'd expect each type of device to have it's own driver anyway so wasn't expecting different types of memory devices to be handled by the same driver. >>> Multiple algorithm implementations can cooperate via calculating >>> abstract distance for different memory nodes. The preference of >>> algorithm implementations can be specified via >>> priority (notifier_block.priority). >> >> How/what decides the priority though? That seems like something better >> decided by a device driver than the algorithm driver IMHO. > > Do we need the memory device driver specific priority? Or we just share > a common priority? For example, the priority of CXL CDAT is always > higher than that of ACPI HMAT? Or architecture specific? Ok, thanks. Having read the above I think the priority is unimportant. Algorithms can either decide to return a distance and NOTIFY_STOP_MASK if they can calculate a distance or NOTIFY_DONE if they can't for a specific device. > And, I don't think that we are forced to use the general notifier > chain interface in all memory device drivers. If the memory device > driver has better understanding of the memory device, it can use other > way to determine abstract distance. For example, a CXL memory device > driver can identify abstract distance by itself. While other memory > device drivers can use the general notifier chain interface at the > same time. Whilst I think personally I would find that flexibility useful I am concerned it means every driver will just end up divining it's own distance rather than ensuring data in HMAT/CDAT/etc. is correct. That would kind of defeat the purpose of it all then.