On 8/1/22 8:07 AM, Huang, Ying wrote: > "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> writes: > >> In the current kernel, memory tiers are defined implicitly via a demotion path >> relationship between NUMA nodes, which is created during the kernel >> initialization and updated when a NUMA node is hot-added or hot-removed. The >> current implementation puts all nodes with CPU into the highest tier, and builds >> the tier hierarchy tier-by-tier by establishing the per-node demotion targets >> based on the distances between nodes. >> >> This current memory tier kernel implementation needs to be improved for several >> important use cases, >> >> The current tier initialization code always initializes each memory-only NUMA >> node into a lower tier. But a memory-only NUMA node may have a high performance >> memory device (e.g. a DRAM-backed memory-only node on a virtual machine) that >> should be put into a higher tier. >> >> The current tier hierarchy always puts CPU nodes into the top tier. But on a >> system with HBM or GPU devices, the memory-only NUMA nodes mapping these devices >> should be in the top tier, and DRAM nodes with CPUs are better to be placed into >> the next lower tier. >> >> With current kernel higher tier node can only be demoted to nodes with shortest >> distance on the next lower tier as defined by the demotion path, not any other >> node from any lower tier. This strict, demotion order does not work in all use >> cases (e.g. some use cases may want to allow cross-socket demotion to another >> node in the same demotion tier as a fallback when the preferred demotion node is >> out of space), This demotion order is also inconsistent with the page allocation >> fallback order when all the nodes in a higher tier are out of space: The page >> allocation can fall back to any node from any lower tier, whereas the demotion >> order doesn't allow that. >> >> This patch series address the above by defining memory tiers explicitly. >> >> Linux kernel presents memory devices as NUMA nodes and each memory device is of >> a specific type. The memory type of a device is represented by its abstract >> distance. A memory tier corresponds to a range of abstract distance. This allows >> for classifying memory devices with a specific performance range into a memory >> tier. >> >> This patch configures the range/chunk size to be 128. The default DRAM >> abstract distance is 512. We can have 4 memory tiers below the default DRAM > ~~~~~ > > above? Updated the above as below. This patch configures the range/chunk size to be 128. The default DRAM abstract distance is 512. We can have 4 memory tiers below the default DRAM with abstract distance range 0 - 127, 127 - 255, 256- 383, 384 - 511. Faster memory devices can be placed in these faster(higher) memory tiers. Slower memory devices like persistent memory will have abstract distance higher than the default DRAM level. > >> abstract distance which cover the range 0 - 127, 127 - 255, 256- 383, 384 - 511. >> Slower memory devices like persistent memory will have abstract distance higher >> than the default DRAM level. >> -aneesh