On Thu, Aug 10, 2023 at 04:56:36PM +0800, Yan Zhao wrote: >This is an RFC series trying to fix the issue of unnecessary NUMA >protection and TLB-shootdowns found in VMs with assigned devices or VFIO >mediated devices during NUMA balance. > >For VMs with assigned devices or VFIO mediated devices, all or part of >guest memory are pinned for long-term. > >Auto NUMA balancing will periodically selects VMAs of a process and change >protections to PROT_NONE even though some or all pages in the selected >ranges are long-term pinned for DMAs, which is true for VMs with assigned >devices or VFIO mediated devices. > >Though this will not cause real problem because NUMA migration will >ultimately reject migration of those kind of pages and restore those >PROT_NONE PTEs, it causes KVM's secondary MMU to be zapped periodically >with equal SPTEs finally faulted back, wasting CPU cycles and generating >unnecessary TLB-shootdowns. In my understanding, NUMA balancing also moves tasks closer to the memory they are accessing. Can this still work with this series applied? > >This series first introduces a new flag MMU_NOTIFIER_RANGE_NUMA in patch 1 >to work with mmu notifier event type MMU_NOTIFY_PROTECTION_VMA, so that >the subscriber (e.g.KVM) of the mmu notifier can know that an invalidation >event is sent for NUMA migration purpose in specific. > >Patch 2 skips setting PROT_NONE to long-term pinned pages in the primary >MMU to avoid NUMA protection introduced page faults and restoration of old >huge PMDs/PTEs in primary MMU. > >Patch 3 introduces a new mmu notifier callback .numa_protect(), which >will be called in patch 4 when a page is ensured to be PROT_NONE protected. > >Then in patch 5, KVM can recognize a .invalidate_range_start() notification >is for NUMA balancing specific and do not do the page unmap in secondary >MMU until .numa_protect() comes. > > >Changelog: >RFC v1 --> v2: >1. added patch 3-4 to introduce a new callback .numa_protect() >2. Rather than have KVM duplicate logic to check if a page is pinned for >long-term, let KVM depend on the new callback .numa_protect() to do the >page unmap in secondary MMU for NUMA migration purpose. > >RFC v1: >https://lore.kernel.org/all/20230808071329.19995-1-yan.y.zhao@xxxxxxxxx/ > >Yan Zhao (5): > mm/mmu_notifier: introduce a new mmu notifier flag > MMU_NOTIFIER_RANGE_NUMA > mm: don't set PROT_NONE to maybe-dma-pinned pages for NUMA-migrate > purpose > mm/mmu_notifier: introduce a new callback .numa_protect > mm/autonuma: call .numa_protect() when page is protected for NUMA > migrate > KVM: Unmap pages only when it's indeed protected for NUMA migration > > include/linux/mmu_notifier.h | 16 ++++++++++++++++ > mm/huge_memory.c | 6 ++++++ > mm/mmu_notifier.c | 18 ++++++++++++++++++ > mm/mprotect.c | 10 +++++++++- > virt/kvm/kvm_main.c | 25 ++++++++++++++++++++++--- > 5 files changed, 71 insertions(+), 4 deletions(-) > >-- >2.17.1 >