Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM

David Hildenbrand <david@xxxxxxxxxx> · Thu, 17 Aug 2023 09:38:37 +0200

On 17.08.23 07:05, Yan Zhao wrote:
On Wed, Aug 16, 2023 at 11:00:36AM -0700, John Hubbard wrote:
On 8/16/23 02:49, David Hildenbrand wrote:
But do 32bit architectures even care about NUMA hinting? If not, just
ignore them ...

Probably not!

...
So, do you mean that let kernel provide a per-VMA allow/disallow
mechanism, and
it's up to the user space to choose between per-VMA and complex way or
global and simpler way?

QEMU could do either way. The question would be if a per-vma settings
makes sense for NUMA hinting.

 From our experience with compute on GPUs, a per-mm setting would suffice.
No need to go all the way to VMA granularity.

After an offline internal discussion, we think a per-mm setting is also
enough for device passthrough in VMs.

BTW, if we want a per-VMA flag, compared to VM_NO_NUMA_BALANCING, do you
think it's of any value to providing a flag like VM_MAYDMA?
Auto NUMA balancing or other components can decide how to use it by
themselves.

Short-lived DMA is not really the problem. The problem is long-term pinning.

There was a discussion about letting user space similarly hint that 
long-term pinning might/will happen.

Because when long-term pinning a page we have to make sure to migrate it 
off of ZONE_MOVABLE / MIGRATE_CMA.

But the kernel prefers to place pages there.

So with vfio in QEMU, we might preallocate memory for the guest and 
place it on ZONE_MOVABLE/MIGRATE_CMA, just so long-term pinning has to 
migrate all these fresh pages out of these areas again.

So letting the kernel know about that in this context might also help.

--
Cheers,

David / dhildenb