"Huang, Ying" <ying.huang@xxxxxxxxx> writes: > Hi, Alistair, > > Sorry for late response. Just come back from vacation. Ditto for this response :-) I see Andrew has taken this into mm-unstable though, so my bad for not getting around to following all this up sooner. > Alistair Popple <apopple@xxxxxxxxxx> writes: > >> "Huang, Ying" <ying.huang@xxxxxxxxx> writes: >> >>> Alistair Popple <apopple@xxxxxxxxxx> writes: >>> >>>> "Huang, Ying" <ying.huang@xxxxxxxxx> writes: >>>> >>>>> Alistair Popple <apopple@xxxxxxxxxx> writes: >>>>> >>>>>>>>> While other memory device drivers can use the general notifier chain >>>>>>>>> interface at the same time. >>>>>> >>>>>> How would that work in practice though? The abstract distance as far as >>>>>> I can tell doesn't have any meaning other than establishing preferences >>>>>> for memory demotion order. Therefore all calculations are relative to >>>>>> the rest of the calculations on the system. So if a driver does it's own >>>>>> thing how does it choose a sensible distance? IHMO the value here is in >>>>>> coordinating all that through a standard interface, whether that is HMAT >>>>>> or something else. >>>>> >>>>> Only if different algorithms follow the same basic principle. For >>>>> example, the abstract distance of default DRAM nodes are fixed >>>>> (MEMTIER_ADISTANCE_DRAM). The abstract distance of the memory device is >>>>> in linear direct proportion to the memory latency and inversely >>>>> proportional to the memory bandwidth. Use the memory latency and >>>>> bandwidth of default DRAM nodes as base. >>>>> >>>>> HMAT and CDAT report the raw memory latency and bandwidth. If there are >>>>> some other methods to report the raw memory latency and bandwidth, we >>>>> can use them too. >>>> >>>> Argh! So we could address my concerns by having drivers feed >>>> latency/bandwidth numbers into a standard calculation algorithm right? >>>> Ie. Rather than having drivers calculate abstract distance themselves we >>>> have the notifier chains return the raw performance data from which the >>>> abstract distance is derived. >>> >>> Now, memory device drivers only need a general interface to get the >>> abstract distance from the NUMA node ID. In the future, if they need >>> more interfaces, we can add them. For example, the interface you >>> suggested above. >> >> Huh? Memory device drivers (ie. dax/kmem.c) don't care about abstract >> distance, it's a meaningless number. The only reason they care about it >> is so they can pass it to alloc_memory_type(): >> >> struct memory_dev_type *alloc_memory_type(int adistance) >> >> Instead alloc_memory_type() should be taking bandwidth/latency numbers >> and the calculation of abstract distance should be done there. That >> resovles the issues about how drivers are supposed to devine adistance >> and also means that when CDAT is added we don't have to duplicate the >> calculation code. > > In the current design, the abstract distance is the key concept of > memory types and memory tiers. And it is used as interface to allocate > memory types. This provides more flexibility than some other interfaces > (e.g. read/write bandwidth/latency). For example, in current > dax/kmem.c, if HMAT isn't available in the system, the default abstract > distance: MEMTIER_DEFAULT_DAX_ADISTANCE is used. This is still useful > to support some systems now. On a system without HMAT/CDAT, it's > possible to calculate abstract distance from ACPI SLIT, although this is > quite limited. I'm not sure whether all systems will provide read/write > bandwith/latency data for all memory devices. > > HMAT and CDAT or some other mechanisms may provide the read/write > bandwidth/latency data to be used to calculate abstract distance. For > them, we can provide a shared implementation in mm/memory-tiers.c to map > from read/write bandwith/latency to the abstract distance. Can this > solve your concerns about the consistency among algorithms? If so, we > can do that when we add the second algorithm that needs that. I guess it would address my concerns if we did that now. I don't see why we need to wait for a second implementation for that though - the whole series seems to be built around adding a framework for supporting multiple algorithms even though only one exists. So I think we should support that fully, or simplfy the whole thing and just assume the only thing that exists is HMAT and get rid of the general interface until a second algorithm comes along.