On Tue, Aug 16, 2022 at 4:12 PM Bharata B Rao <bharata@xxxxxxx> wrote: > > On 8/16/2022 12:58 PM, huang ying wrote: > > On Tue, Aug 16, 2022 at 1:10 PM Aneesh Kumar K V > > <aneesh.kumar@xxxxxxxxxxxxx> wrote: > >> > >> On 8/15/22 8:09 AM, Huang, Ying wrote: > >>> "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> writes: > >>> > > > > [snip] > > > >>>> > >>>> +/* > >>>> + * Default abstract distance assigned to the NUMA node onlined > >>>> + * by DAX/kmem if the low level platform driver didn't initialize > >>>> + * one for this NUMA node. > >>>> + */ > >>>> +#define MEMTIER_DEFAULT_DAX_ADISTANCE (MEMTIER_ADISTANCE_DRAM * 2) > >>> > >>> If my understanding were correct, this is targeting Optane DCPMM for > >>> now. The measured results in the following paper is, > >>> > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Farxiv.org%2Fpdf%2F2002.06018.pdf&data=05%7C01%7Cbharata%40amd.com%7C1c5015b55ff849e5275408da7f58e67d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637962317187856589%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SxSC8WaKEeTyfZXoqtI%2FZAoBXXp82PnTeyyavrV%2FGGg%3D&reserved=0 > >>> > >>> Section: 2.1 Read/Write Latencies > >>> > >>> " > >>> For read access, the latency of DCPMM was 400.1% higher than that of > >>> DRAM. For write access, it was 407.1% higher. > >>> " > >>> > >>> Section: 2.2 Read/Write Bandwidths > >>> > >>> " > >>> For read access, the throughput of DCPMM was 37.1% of DRAM. For write > >>> access, it was 7.8% > >>> " > >>> > >>> According to the above data, I think the MEMTIER_DEFAULT_DAX_ADISTANCE > >>> can be "5 * MEMTIER_ADISTANCE_DRAM". > >>> > >> > >> If we look at mapping every 100% increase in latency as a memory tier, we essentially > >> will have 4 memory tier here. Each memory tier is covering a range of abstract distance 128. > >> which makes a total adistance increase from MEMTIER_ADISTANCE_DRAM by 512. This puts > >> DEFAULT_DAX_DISTANCE at 1024 or MEMTIER_ADISTANCE_DRAM * 2 > > > > If my understanding were correct, you are suggesting to use a kind of > > logarithmic mapping from latency to abstract distance? That is, > > > > abstract_distance = log2(latency) > > > > While I am suggesting to use a kind of linear mapping from latency to > > abstract distance. That is, > > > > abstract_distance = C * latency > > > > I think that linear mapping is easy to understand. > > > > Are there some good reasons to use logarithmic mapping? > > Also, what is the recommendation for using bandwidth measure which > may be available from HMAT for CXL memory? How is bandwidth going > to influence the abstract distance? This is a good question. Per my understanding, latency stands for idle latency by default. But in practice, the latency under some reasonable memory accessing throughput is the "real" latency. So the memory with lower bandwidth should have a larger abstract distance than the memory with higher bandwidth even if the idle latency is the same. But I don't have a perfect formula to combine idle latency and bandwidth into abstract distance. One possibility is to increase abstract distance if the bandwidth of the memory is much lower than that of DRAM. Best Regards, Huang, Ying