Re: [RFC PATCH v2 0/5] Reduce NUMA balance caused TLB-shootdowns in a VM

David Hildenbrand <david@xxxxxxxxxx> · Wed, 16 Aug 2023 09:43:40 +0200

On 15.08.23 04:34, John Hubbard wrote:
On 8/14/23 02:09, Yan Zhao wrote:
...
hmm_range_fault()-based memory management in particular might benefit
from having NUMA balancing disabled entirely for the memremap_pages()
region, come to think of it. That seems relatively easy and clean at
first glance anyway.

For other regions (allocated by the device driver), a per-VMA flag
seems about right: VM_NO_NUMA_BALANCING ?

Thanks a lot for those good suggestions!
For VMs, when could a per-VMA flag be set?
Might be hard in mmap() in QEMU because a VMA may not be used for DMA until
after it's mapped into VFIO.
Then, should VFIO set this flag on after it maps a range?
Could this flag be unset after device hot-unplug?

I'm hoping someone who thinks about VMs and VFIO often can chime in.

At least QEMU could just set it on the applicable VMAs (as said by Yuan 
Yao, using madvise).

BUT, I do wonder what value there would be for autonuma to still be 
active for the remainder of the hypervisor. If there is none, a prctl() 
would be better.

We already do have a mechanism in QEMU to get notified when 
longterm-pinning in the kernel might happen (and, therefore, 
MADV_DONTNEED must not be used):
* ram_block_discard_disable()
* ram_block_uncoordinated_discard_disable()

--
Cheers,

David / dhildenb