"Ho-Ren (Jack) Chuang" <horenchuang@xxxxxxxxxxxxx> writes: > On Sun, Mar 3, 2024 at 6:47 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: >> >> "Ho-Ren (Jack) Chuang" <horenchuang@xxxxxxxxxxxxx> writes: >> >> > The memory tiering component in the kernel is functionally useless for >> > CPUless memory/non-DRAM devices like CXL1.1 type3 memory because the nodes >> > are lumped together in the DRAM tier. >> > https://lore.kernel.org/linux-mm/PH0PR08MB7955E9F08CCB64F23963B5C3A860A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/ >> >> I think that it's unfair to call it "useless". Yes, it doesn't work if >> the CXL memory device are not enumerate via drivers/dax/kmem.c. So, >> please be specific about in which cases it doesn't work instead of too >> general "useless". >> > > Thank you and I didn't mean anything specific. I simply reused phrases > we discussed > earlier in the previous patchset. I will change them to the following in v2: > "At boot time, current memory tiering assigns all detected memory nodes > to the same DRAM tier. This results in CPUless memory/non-DRAM devices, > such as CXL1.1 type3 memory, being unable to be assigned to the > correct memory tier, > leading to the inability to migrate pages between different types of memory." > > Please see if this looks more specific. I don't think that the description above is accurate. In fact, there are 2 ways to enumerate the memory device, 1. Mark it as reserved memory (E820_TYPE_SOFT_RESERVED, etc.) in E820 table or something similar. 2. Mark it as normal memory (E820_TYPE_RAM) in E820 table or something similar For 1, the memory device (including CXL memory) is onlined via drivers/dax/kmem.c, so will be put in proper memory tiers. For 2, the memory device is indistinguishable with normal DRAM with current implementation. And this is what this patch is working on. Right? -- Best Regards, Huang, Ying >> > This patchset automatically resolves the issues. It delays the initialization >> > of memory tiers for CPUless NUMA nodes until they obtain HMAT information >> > at boot time, eliminating the need for user intervention. >> > If no HMAT specified, it falls back to using `default_dram_type`. >> > >> > Example usecase: >> > We have CXL memory on the host, and we create VMs with a new system memory >> > device backed by host CXL memory. We inject CXL memory performance attributes >> > through QEMU, and the guest now sees memory nodes with performance attributes >> > in HMAT. With this change, we enable the guest kernel to construct >> > the correct memory tiering for the memory nodes. >> > >> > Ho-Ren (Jack) Chuang (1): >> > memory tier: acpi/hmat: create CPUless memory tiers after obtaining >> > HMAT info >> > >> > drivers/acpi/numa/hmat.c | 3 ++ >> > include/linux/memory-tiers.h | 6 +++ >> > mm/memory-tiers.c | 76 ++++++++++++++++++++++++++++++++---- >> > 3 files changed, 77 insertions(+), 8 deletions(-) >> >> -- >> Best Regards, >> Huang, Ying