On Wed, 30 Mar 2022 22:06:52 +0530 Jagdish Gediya <jvgediya@xxxxxxxxxxxxx> wrote: > Hi Huang, > > On Wed, Mar 30, 2022 at 02:46:51PM +0800, Huang, Ying wrote: > > Hi, Jagdish, > > > > Jagdish Gediya <jvgediya@xxxxxxxxxxxxx> writes: > > > > > The current implementation to identify the demotion > > > targets limits some of the opportunities to share > > > the demotion targets between multiple source nodes. > > > > Yes. It sounds reasonable to share demotion targets among multiple > > source nodes. > > > > One question, are example machines below are real hardware now or in > > near future? Or you just think they are possible? > > They are not real hardware right now, they are the future possibilities. I'll strengthen that a bit to say they are highly likely to turn up fairly soon. Often they will be result of specific interleaving decisions and might not come from SRAT (e.g. might be added later as a result of CXL discovery) but the principal will be the same. Also, in some cases the firmware will have done the CXL setup so it will be via SRAT. example 1: > > e.g. with below NUMA topology, where node 0 & 1 are > > cpu + dram nodes, node 2 & 3 are equally slower memory > > only nodes, and node 4 is slowest memory only node, > > Couple of near term examples of systems that will look like this. 2 socket system, each socket has DRAM (0,1) + NVDIMM (2,3) being used as DRAM. Also CXL attached memory and to get maximum bandwidth to a large remote pool, interleave across host bridges in the two sockets (Node 4) An alternative for node 4 is a that the system is using an IO expander bridge on the CPU interconnect (effectively a CPU less CPU :) Example 2: > > e.g. with below NUMA topology, where node 0, 1 & 2 are > > cpu + dram nodes and node 3 is slow memory node, > > 3 CPU socket machines are still unusual, but the 2 or 4 socket equivalents are simplifications of example 1. Example 3: > > with below NUMA topology, where node 0 & 2 are cpu + dram > > nodes and node 1 & 3 are slow memory nodes, I believe this is what today's simple DRAM + NVDIMM (as cheap DRAM) 2 sockets servers look like. Example 4: > Another example, node 0 & 2 are cpu + dram nodes and node 1 are slow > memory node near node 0, > Normal system with CXL attached DRAM below an RP in node 0 ... Jonathan