Yan, your reply came through in HTML. It doesn't bother me too much, but you'll find your replies dropped by LKML and other mailing lists if you do this. On 6/21/21 7:50 AM, Zi Yan wrote: > Is there a plan of allowing user to change where the migration path > starts? Or maybe one step further providing an interface to allow > user to specify the demotion path. Something like > /sys/devices/system/node/node*/node_demotion. We actually had this in an earlier series. I pulled it out because we don't really *need* this ABI at the moment. But, I totally agree that it would be handy for many things, including any non-obvious topology where the built-in ordering isn't optimal. > I don't think that's necessary at least for now. Do you know any > real world use case for this? > > In our P9+volta system, GPU memory is exposed as a NUMA node. For > the GPU workloads with data size greater than GPU memory size, it > will be very helpful to allow pages in GPU memory to be > migrated/demoted to CPU memory. With your current assumption, GPU > memory -> CPU memory demotion seems not possible, right? This > should also apply to any system with a device memory exposed as a > NUMA node and workloads running on the device and using CPU memory > as a lower tier memory than the device memory. Yes, with the current ordering, CPU memory would be demoted to the GPU, not the other way around. The right way to fix this (on ACPI platforms at least) is probably to use the HMAT table and build the demotion based on any memory targets rather than just CPUs. That would be a great future enhancement to all of this. But, because not all systems have HMATs, we also need something more basic, which is what is in this series.