On 3/27/19 1:35 PM, Matthew Wilcox wrote: > > pmem1 --- node1 --- node2 --- pmem2 > | \ / | > | X | > | / \ | > pmem3 --- node3 --- node4 --- pmem4 > > which I could actually see someone building with normal DRAM, and we > should probably handle the same way as pmem; for a process running on > node3, allocate preferentially from node3, then pmem3, then other nodes, > then other pmems. That makes sense. But, it might _also_ make sense to fill up all DRAM first before using any pmem. That could happen if the NUMA interconnect is really fast and pmem is really slow. Basically, with the current patches we are depending on the firmware to "nicely" enumerate the topology and we're keeping the behavior that we end up with, for now, whatever it might be. Now, let's sit back and see how nice the firmware is. :)