On 10/28/22 8:33 AM, Huang, Ying wrote: > Hi, Aneesh, > > Aneesh Kumar K V <aneesh.kumar@xxxxxxxxxxxxx> writes: > >> On 10/27/22 12:29 PM, Huang Ying wrote: >>> We need some way to override the system default memory tiers. For >>> the example system as follows, >>> >>> type abstract distance >>> ---- ----------------- >>> HBM 300 >>> DRAM 1000 >>> CXL_MEM 5000 >>> PMEM 5100 >>> >>> Given the memory tier chunk size is 100, the default memory tiers >>> could be, >>> >>> tier abstract distance types >>> range >>> ---- ----------------- ----- >>> 3 300-400 HBM >>> 10 1000-1100 DRAM >>> 50 5000-5100 CXL_MEM >>> 51 5100-5200 PMEM >>> >>> If we want to group CXL MEM and PMEM into one tier, we have 2 choices. >>> >>> 1) Override the abstract distance of CXL_MEM or PMEM. For example, if >>> we change the abstract distance of PMEM to 5050, the memory tiers >>> become, >>> >>> tier abstract distance types >>> range >>> ---- ----------------- ----- >>> 3 300-400 HBM >>> 10 1000-1100 DRAM >>> 50 5000-5100 CXL_MEM, PMEM >>> >>> 2) Override the memory tier chunk size. For example, if we change the >>> memory tier chunk size to 200, the memory tiers become, >>> >>> tier abstract distance types >>> range >>> ---- ----------------- ----- >>> 1 200-400 HBM >>> 5 1000-1200 DRAM >>> 25 5000-5200 CXL_MEM, PMEM >>> >>> But after some thoughts, I think choice 2) may be not good. The >>> problem is that even if 2 abstract distances are almost same, they may >>> be put in 2 tier if they sit in the different sides of the tier >>> boundary. For example, if the abstract distance of CXL_MEM is 4990, >>> while the abstract distance of PMEM is 5010. Although the difference >>> of the abstract distances is only 20, CXL_MEM and PMEM will put in >>> different tiers if the tier chunk size is 50, 100, 200, 250, 500, .... >>> This makes choice 2) hard to be used, it may become tricky to find out >>> the appropriate tier chunk size that satisfying all requirements. >>> >> >> Shouldn't we wait for gaining experience w.r.t how we would end up >> mapping devices with different latencies and bandwidth before tuning these values? > > Just want to discuss the overall design. > >>> So I suggest to abandon choice 2) and use choice 1) only. This makes >>> the overall design and user space interface to be simpler and easier >>> to be used. The overall design of the abstract distance could be, >>> >>> 1. Use decimal for abstract distance and its chunk size. This makes >>> them more user friendly. >>> >>> 2. Make the tier chunk size as small as possible. For example, 10. >>> This will put different memory types in one memory tier only if their >>> performance is almost same by default. And we will not provide the >>> interface to override the chunk size. >>> >> >> this could also mean we can end up with lots of memory tiers with relative >> smaller performance difference between them. Again it depends how HMAT >> attributes will be used to map to abstract distance. > > Per my understanding, there will not be many memory types in a system. > So, there will not be many memory tiers too. In most systems, there are > only 2 or 3 memory tiers in the system, for example, HBM, DRAM, CXL, > etc. So we don't need the chunk size to be 10 because we don't forsee us needing to group devices into that many tiers. > Do you know systems with many memory types? The basic idea is to > put different memory types in different memory tiers by default. If > users want to group them, they can do that via overriding the abstract > distance of some memory type. > with small chunk size and depending on how we are going to derive abstract distance, I am wondering whether we would end up with lots of memory tiers with no real value. Hence my suggestion to wait making a change like this till we have code that map HMAT/CDAT attributes to abstract distance. >> >>> 3. Make the abstract distance of normal DRAM large enough. For >>> example, 1000, then 100 tiers can be defined below DRAM, this is >>> more than enough in practice. >> >> Why 100? Will we really have that many tiers below/faster than DRAM? As of now >> I see only HBM below it. > > Yes. 100 is more than enough. We just want to avoid to group different > memory types by default. > > Best Regards, > Huang, Ying >