Wei Xu <weixugc@xxxxxxxxxx> writes: > On Mon, May 9, 2022 at 7:32 AM Hesham Almatary > <hesham.almatary@xxxxxxxxxx> wrote: >> .... > > nearest lower tier before demoting to lower lower tiers. >> There might still be simple cases/topologies where we might want to "skip" >> the very next lower tier. For example, assume we have a 3 tiered memory >> system as follows: >> >> node 0 has a CPU and DDR memory in tier 0, node 1 has GPU and DDR memory >> in tier 0, >> node 2 has NVMM memory in tier 1, node 3 has some sort of bigger memory >> (could be a bigger DDR or something) in tier 2. The distances are as >> follows: >> >> -------------- -------------- >> | Node 0 | | Node 1 | >> | ------- | | ------- | >> | | DDR | | | | DDR | | >> | ------- | | ------- | >> | | | | >> -------------- -------------- >> | 20 | 120 | >> v v | >> ---------------------------- | >> | Node 2 PMEM | | 100 >> ---------------------------- | >> | 100 | >> v v >> -------------------------------------- >> | Node 3 Large mem | >> -------------------------------------- >> >> node distances: >> node 0 1 2 3 >> 0 10 20 20 120 >> 1 20 10 120 100 >> 2 20 120 10 100 >> 3 120 100 100 10 >> >> /sys/devices/system/node/memory_tiers >> 0-1 >> 2 >> 3 >> >> N_TOPTIER_MEMORY: 0-1 >> >> >> In this case, we want to be able to "skip" the demotion path from Node 1 >> to Node 2, >> >> and make demotion go directely to Node 3 as it is closer, distance wise. >> How can >> >> we accommodate this scenario (or at least not rule it out as future >> work) with the current RFC? > > This is an interesting example. I think one way to support this is to > allow all the lower tier nodes to be the demotion targets of a node in > the higher tier. We can then use the allocation fallback order to > select the best demotion target. > > For this example, we will have the demotion targets of each node as: > > node 0: allowed=2-3, order (based on allocation fallback order): 2, 3 > node 1: allowed=2-3, order (based on allocation fallback order): 3, 2 > node 2: allowed = 3, order (based on allocation fallback order): 3 > node 3: allowed = empty > > What do you think? > Can we simplify this further with tier 0 - > empty (no HBM/GPU) tier 1 -> Node0, Node1 tier 2 -> Node2, Node3 Hence node 0: allowed=2-3, order (based on allocation fallback order): 2, 3 node 1: allowed=2-3, order (based on allocation fallback order): 3, 2 node 2: allowed = empty node 3: allowed = empty -aneesh