On 17.08.23 07:05, Yan Zhao wrote:
On Wed, Aug 16, 2023 at 11:00:36AM -0700, John Hubbard wrote:
On 8/16/23 02:49, David Hildenbrand wrote:
But do 32bit architectures even care about NUMA hinting? If not, just
ignore them ...
Probably not!
...
So, do you mean that let kernel provide a per-VMA allow/disallow
mechanism, and
it's up to the user space to choose between per-VMA and complex way or
global and simpler way?
QEMU could do either way. The question would be if a per-vma settings
makes sense for NUMA hinting.
From our experience with compute on GPUs, a per-mm setting would suffice.
No need to go all the way to VMA granularity.
After an offline internal discussion, we think a per-mm setting is also
enough for device passthrough in VMs.
BTW, if we want a per-VMA flag, compared to VM_NO_NUMA_BALANCING, do you
think it's of any value to providing a flag like VM_MAYDMA?
Auto NUMA balancing or other components can decide how to use it by
themselves.
Short-lived DMA is not really the problem. The problem is long-term pinning.
There was a discussion about letting user space similarly hint that
long-term pinning might/will happen.
Because when long-term pinning a page we have to make sure to migrate it
off of ZONE_MOVABLE / MIGRATE_CMA.
But the kernel prefers to place pages there.
So with vfio in QEMU, we might preallocate memory for the guest and
place it on ZONE_MOVABLE/MIGRATE_CMA, just so long-term pinning has to
migrate all these fresh pages out of these areas again.
So letting the kernel know about that in this context might also help.
--
Cheers,
David / dhildenb