Alistair Popple <apopple@xxxxxxxxxx> writes: > "Huang, Ying" <ying.huang@xxxxxxxxx> writes: > >> Hi, Alistair, >> >> Sorry for late response. Just come back from vacation. > > Ditto for this response :-) > > I see Andrew has taken this into mm-unstable though, so my bad for not > getting around to following all this up sooner. > >> Alistair Popple <apopple@xxxxxxxxxx> writes: >> >>> "Huang, Ying" <ying.huang@xxxxxxxxx> writes: >>> >>>> Alistair Popple <apopple@xxxxxxxxxx> writes: >>>> >>>>> "Huang, Ying" <ying.huang@xxxxxxxxx> writes: >>>>> >>>>>> Alistair Popple <apopple@xxxxxxxxxx> writes: >>>>>> >>>>>>>>>> While other memory device drivers can use the general notifier chain >>>>>>>>>> interface at the same time. >>>>>>> >>>>>>> How would that work in practice though? The abstract distance as far as >>>>>>> I can tell doesn't have any meaning other than establishing preferences >>>>>>> for memory demotion order. Therefore all calculations are relative to >>>>>>> the rest of the calculations on the system. So if a driver does it's own >>>>>>> thing how does it choose a sensible distance? IHMO the value here is in >>>>>>> coordinating all that through a standard interface, whether that is HMAT >>>>>>> or something else. >>>>>> >>>>>> Only if different algorithms follow the same basic principle. For >>>>>> example, the abstract distance of default DRAM nodes are fixed >>>>>> (MEMTIER_ADISTANCE_DRAM). The abstract distance of the memory device is >>>>>> in linear direct proportion to the memory latency and inversely >>>>>> proportional to the memory bandwidth. Use the memory latency and >>>>>> bandwidth of default DRAM nodes as base. >>>>>> >>>>>> HMAT and CDAT report the raw memory latency and bandwidth. If there are >>>>>> some other methods to report the raw memory latency and bandwidth, we >>>>>> can use them too. >>>>> >>>>> Argh! So we could address my concerns by having drivers feed >>>>> latency/bandwidth numbers into a standard calculation algorithm right? >>>>> Ie. Rather than having drivers calculate abstract distance themselves we >>>>> have the notifier chains return the raw performance data from which the >>>>> abstract distance is derived. >>>> >>>> Now, memory device drivers only need a general interface to get the >>>> abstract distance from the NUMA node ID. In the future, if they need >>>> more interfaces, we can add them. For example, the interface you >>>> suggested above. >>> >>> Huh? Memory device drivers (ie. dax/kmem.c) don't care about abstract >>> distance, it's a meaningless number. The only reason they care about it >>> is so they can pass it to alloc_memory_type(): >>> >>> struct memory_dev_type *alloc_memory_type(int adistance) >>> >>> Instead alloc_memory_type() should be taking bandwidth/latency numbers >>> and the calculation of abstract distance should be done there. That >>> resovles the issues about how drivers are supposed to devine adistance >>> and also means that when CDAT is added we don't have to duplicate the >>> calculation code. >> >> In the current design, the abstract distance is the key concept of >> memory types and memory tiers. And it is used as interface to allocate >> memory types. This provides more flexibility than some other interfaces >> (e.g. read/write bandwidth/latency). For example, in current >> dax/kmem.c, if HMAT isn't available in the system, the default abstract >> distance: MEMTIER_DEFAULT_DAX_ADISTANCE is used. This is still useful >> to support some systems now. On a system without HMAT/CDAT, it's >> possible to calculate abstract distance from ACPI SLIT, although this is >> quite limited. I'm not sure whether all systems will provide read/write >> bandwith/latency data for all memory devices. >> >> HMAT and CDAT or some other mechanisms may provide the read/write >> bandwidth/latency data to be used to calculate abstract distance. For >> them, we can provide a shared implementation in mm/memory-tiers.c to map >> from read/write bandwith/latency to the abstract distance. Can this >> solve your concerns about the consistency among algorithms? If so, we >> can do that when we add the second algorithm that needs that. > > I guess it would address my concerns if we did that now. I don't see why > we need to wait for a second implementation for that though - the whole > series seems to be built around adding a framework for supporting > multiple algorithms even though only one exists. So I think we should > support that fully, or simplfy the whole thing and just assume the only > thing that exists is HMAT and get rid of the general interface until a > second algorithm comes along. We will need a general interface even for one algorithm implementation. Because it's not good to make a dax subsystem driver (dax/kmem) to depend on a ACPI subsystem driver (acpi/hmat). We need some general interface at subsystem level (memory tier here) between them. Best Regards, Huang, Ying