"Huang, Ying" <ying.huang@xxxxxxxxx> writes: > "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> writes: > >> In the current kernel, memory tiers are defined implicitly via a demotion path >> relationship between NUMA nodes, which is created during the kernel >> initialization and updated when a NUMA node is hot-added or hot-removed. The >> current implementation puts all nodes with CPU into the highest tier, and builds >> the tier hierarchy tier-by-tier by establishing the per-node demotion targets >> based on the distances between nodes. >> >> This current memory tier kernel implementation needs to be improved for several >> important use cases, >> >> The current tier initialization code always initializes each memory-only NUMA >> node into a lower tier. But a memory-only NUMA node may have a high performance >> memory device (e.g. a DRAM-backed memory-only node on a virtual machine) that >> should be put into a higher tier. >> >> The current tier hierarchy always puts CPU nodes into the top tier. But on a >> system with HBM or GPU devices, the memory-only NUMA nodes mapping these devices >> should be in the top tier, and DRAM nodes with CPUs are better to be placed into >> the next lower tier. >> >> With current kernel higher tier node can only be demoted to nodes with shortest >> distance on the next lower tier as defined by the demotion path, not any other >> node from any lower tier. This strict, demotion order does not work in all use >> cases (e.g. some use cases may want to allow cross-socket demotion to another >> node in the same demotion tier as a fallback when the preferred demotion node is >> out of space), This demotion order is also inconsistent with the page allocation >> fallback order when all the nodes in a higher tier are out of space: The page >> allocation can fall back to any node from any lower tier, whereas the demotion >> order doesn't allow that. >> >> This patch series address the above by defining memory tiers explicitly. >> >> Linux kernel presents memory devices as NUMA nodes and each memory device is of >> a specific type. The memory type of a device is represented by its abstract >> distance. A memory tier corresponds to a range of abstract distance. This allows >> for classifying memory devices with a specific performance range into a memory >> tier. >> >> This patch configures the range/chunk size to be 128. The default DRAM >> abstract distance is 512. We can have 4 memory tiers below the default DRAM >> abstract distance which cover the range 0 - 127, 127 - 255, 256- 383, 384 - 511. >> Slower memory devices like persistent memory will have abstract distance below >> the default DRAM level and hence will be placed in these 4 lower tiers. > > For abstract distance, the lower value means higher performance, higher > value means lower performance. So the abstract distance of PMEM should > be smaller than that of DRAM. I noticed that after sending v11 and did send v12 fixing that already which can be found https://lore.kernel.org/linux-mm/20220729061349.968148-1-aneesh.kumar@xxxxxxxxxxxxx > >> A kernel parameter is provided to override the default memory tier. > > Forget to delete? yes. Also fixed in v12. -aneesh