On Fri, Jul 14, 2023 at 5:54 PM Raghavendra Rao Ananta <rananta@xxxxxxxxxx> wrote: > > In certain code paths, KVM/ARM currently invalidates the entire VM's > page-tables instead of just invalidating a necessary range. For example, > when collapsing a table PTE to a block PTE, instead of iterating over > each PTE and flushing them, KVM uses 'vmalls12e1is' TLBI operation to > flush all the entries. This is inefficient since the guest would have > to refill the TLBs again, even for the addresses that aren't covered > by the table entry. The performance impact would scale poorly if many > addresses in the VM is going through this remapping. > > For architectures that implement FEAT_TLBIRANGE, KVM can replace such > inefficient paths by performing the invalidations only on the range of > addresses that are in scope. This series tries to achieve the same in > the areas of stage-2 map, unmap and write-protecting the pages. > > As suggested by Oliver in the original v5 of the series [1], I'm > reposting the series by including v2 of David Matlack's 'KVM: Add a > common API for range-based TLB invalidation' series [2]. > > Patches 1-4 includes David M.'s patches 1, 2, 6, and 7 from [2]. > > Patch-5 refactors the core arm64's __flush_tlb_range() to be used by > other entities. > > Patch-6,7 adds a range-based TLBI mechanism for KVM (VHE and nVHE). > > Patch-8 implements the kvm_arch_flush_remote_tlbs_range() for arm64. > > Patch-9 aims to flush only the memslot that undergoes a write-protect, > instead of the entire VM. > > Patch-10 operates on stage2_try_break_pte() to use the range based > TLBI instructions when collapsing a table entry. The map path is the > immediate consumer of this when KVM remaps a table entry into a block. > > Patch-11 modifies the stage-2 unmap path in which, if the system > supports > FEAT_TLBIRANGE, the TLB invalidations are skipped during the page-table. > walk. Instead it's done in one go after the entire walk is finished. > > The series is based off of upstream v6.5-rc1. > > The performance evaluation was done on a hardware that supports > FEAT_TLBIRANGE, on a VHE configuration, using a modified > kvm_page_table_test. > The modified version updates the guest code in the ADJUST_MAPPINGS case > to not only access this page but also to access up to 512 pages > backwards > for every new page it iterates through. This is done to test the effect > of TLBI misses after KVM has handled a fault. > > The series captures the impact in the map and unmap paths as described > above. > > $ kvm_page_table_test -m 2 -v 128 -s anonymous_hugetlb_2mb -b $i > > +--------+------------------------------+------------------------------+ > | mem_sz | ADJUST_MAPPINGS (s) | Unmap VM (s) | > | (GB) | Baseline | Baseline + series | Baseline | Baseline + series | > +--------+----------|-------------------+------------------------------+ > | 1 | 3.33 | 3.22 | 0.009 | 0.005 | > | 2 | 7.39 | 7.32 | 0.012 | 0.006 | > | 4 | 13.49 | 10.50 | 0.017 | 0.008 | > | 8 | 21.60 | 21.50 | 0.027 | 0.011 | > | 16 | 57.02 | 43.63 | 0.046 | 0.018 | > | 32 | 95.92 | 83.26 | 0.087 | 0.030 | > | 64 | 199.57 | 165.14 | 0.146 | 0.055 | > | 128 | 423.65 | 349.37 | 0.280 | 0.100 | > +--------+----------+-------------------+----------+-------------------+ > > $ kvm_page_table_test -m 2 -b 128G -s anonymous_hugetlb_2mb -v $i > > +--------+------------------------------+ > | vCPUs | ADJUST_MAPPINGS (s) | > | | Baseline | Baseline + series | > +--------+----------|-------------------+ > | 1 | 111.44 | 114.63 | > | 2 | 102.88 | 74.64 | > | 4 | 134.83 | 98.78 | > | 8 | 98.81 | 95.01 | > | 16 | 127.41 | 99.05 | > | 32 | 105.35 | 91.75 | > | 64 | 201.13 | 163.63 | > | 128 | 423.65 | 349.37 | > +--------+----------+-------------------+ > > For the ADJUST_MAPPINGS cases, which maps back the 4K table entries to > 2M hugepages, the series sees an average improvement of ~15%. For > unmapping 2M hugepages, we see a gain of 2x to 3x. > > $ kvm_page_table_test -m 2 -b $i > > +--------+------------------------------+ > | mem_sz | Unmap VM (s) | > | (GB) | Baseline | Baseline + series | > +--------+------------------------------+ > | 1 | 0.54 | 0.13 | > | 2 | 1.07 | 0.25 | > | 4 | 2.10 | 0.47 | > | 8 | 4.19 | 0.92 | > | 16 | 8.35 | 1.92 | > | 32 | 16.66 | 3.61 | > | 64 | 32.36 | 7.62 | > | 128 | 64.65 | 14.39 | > +--------+----------+-------------------+ > > The series sees an average gain of 4x when the guest backed by > PAGE_SIZE (4K) pages. > > Other testing: > - Booted on x86_64 and ran KVM selftests. > - Build tested for MIPS and RISCV architectures against defconfig. > > Cc: David Matlack <dmatlack@xxxxxxxxxx> > > v6: This should've been 'v5 (RESEND)' with the link: https://lore.kernel.org/all/20230621175002.2832640-1-rananta@xxxxxxxxxx/ - Raghavendra > Thanks, Gavin for the suggestions: > - Adjusted the comment on patch-2 to align with the code. > - Fixed checkpatch.pl warning on patch-5. > > v5: > https://lore.kernel.org/all/20230606192858.3600174-1-rananta@xxxxxxxxxx/ > Thank you, Marc and Oliver for the comments > - Introduced a helper, kvm_tlb_flush_vmid_range(), to handle > the decision of using range-based TLBI instructions or > invalidating the entire VMID, rather than depending on > __kvm_tlb_flush_vmid_range() for it. > - kvm_tlb_flush_vmid_range() splits the range-based invalidations > if the requested range exceeds MAX_TLBI_RANGE_PAGES. > - All the users in need of invalidating the TLB upon a range > now depends on kvm_tlb_flush_vmid_range() rather than directly > on __kvm_tlb_flush_vmid_range(). > - stage2_unmap_defer_tlb_flush() introduces a WARN_ON() to > track if there's any change in TLBIRANGE or FWB support > during the unmap process as the features are based on > alternative patching and the TLBI operations solely depend > on this check. > - Corrected an incorrect hunk being present on v4's patch-3. > - Updated the patches changelog and code comments as per the > suggestions. > > v4: > https://lore.kernel.org/all/20230519005231.3027912-1-rananta@xxxxxxxxxx/ > Thanks again, Oliver for all the comments > - Updated the __kvm_tlb_flush_vmid_range() implementation for > nVHE to adjust with the modfied __tlb_switch_to_guest() that > accepts a new 'bool nsh' arg. > - Renamed stage2_put_pte() to stage2_unmap_put_pte() and removed > the 'skip_flush' argument. > - Defined stage2_unmap_defer_tlb_flush() to check if the PTE > flushes can be deferred during the unmap table walk. It's > being called from stage2_unmap_put_pte() and > kvm_pgtable_stage2_unmap(). > - Got rid of the 'struct stage2_unmap_data'. > > v3: > https://lore.kernel.org/all/20230414172922.812640-1-rananta@xxxxxxxxxx/ > Thanks, Oliver for all the suggestions. > - The core flush API (__kvm_tlb_flush_vmid_range()) now checks if > the system support FEAT_TLBIRANGE or not, thus elimiating the > redundancy in the upper layers. > - If FEAT_TLBIRANGE is not supported, the implementation falls > back to invalidating all the TLB entries with the VMID, instead > of doing an iterative flush for the range. > - The kvm_arch_flush_remote_tlbs_range() doesn't return -EOPNOTSUPP > if the system doesn't implement FEAT_TLBIRANGE. It depends on > __kvm_tlb_flush_vmid_range() to do take care of the decisions > and return 0 regardless of the underlying feature support. > - __kvm_tlb_flush_vmid_range() doesn't take 'level' as input to > calculate the 'stride'. Instead, it always assumes PAGE_SIZE. > - Fast unmap path is eliminated. Instead, the existing unmap walker > is modified to skip the TLBIs during the walk, and do it all at > once after the walk, using the range-based instructions. > > v2: > https://lore.kernel.org/all/20230206172340.2639971-1-rananta@xxxxxxxxxx/ > - Rebased the series on top of David Matlack's series for common > TLB invalidation API[1]. > - Implement kvm_arch_flush_remote_tlbs_range() for arm64, by extending > the support introduced by [1]. > - Use kvm_flush_remote_tlbs_memslot() introduced by [1] to flush > only the current memslot after write-protect. > - Modified the __kvm_tlb_flush_range() macro to accepts 'level' as an > argument to calculate the 'stride' instead of just using PAGE_SIZE. > - Split the patch that introduces the range-based TLBI to KVM and the > implementation of IPA-based invalidation into its own patches. > - Dropped the patch that tries to optimize the mmu notifiers paths. > - Rename the function kvm_table_pte_flush() to > kvm_pgtable_stage2_flush_range(), and accept the range of addresses to > flush. [Oliver] > - Drop the 'tlb_level' argument for stage2_try_break_pte() and directly > pass '0' as 'tlb_level' to kvm_pgtable_stage2_flush_range(). [Oliver] > > v1: > https://lore.kernel.org/all/20230109215347.3119271-1-rananta@xxxxxxxxxx/ > > Thank you. > Raghavendra > > [1]: https://lore.kernel.org/all/ZIrONR6cSegiK1e2@xxxxxxxxx/ > [2]: > https://lore.kernel.org/linux-arm-kernel/20230126184025.2294823-1-dmatlack@xxxxxxxxxx/ > > David Matlack (4): > KVM: Rename kvm_arch_flush_remote_tlb() to > kvm_arch_flush_remote_tlbs() > KVM: arm64: Use kvm_arch_flush_remote_tlbs() > KVM: Allow range-based TLB invalidation from common code > KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code > > Raghavendra Rao Ananta (7): > arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range > KVM: arm64: Implement __kvm_tlb_flush_vmid_range() > KVM: arm64: Define kvm_tlb_flush_vmid_range() > KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range() > KVM: arm64: Flush only the memslot after write-protect > KVM: arm64: Invalidate the table entries upon a range > KVM: arm64: Use TLBI range-based intructions for unmap > > arch/arm64/include/asm/kvm_asm.h | 3 + > arch/arm64/include/asm/kvm_host.h | 6 ++ > arch/arm64/include/asm/kvm_pgtable.h | 10 +++ > arch/arm64/include/asm/tlbflush.h | 109 ++++++++++++++------------- > arch/arm64/kvm/Kconfig | 1 - > arch/arm64/kvm/arm.c | 6 -- > arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +++ > arch/arm64/kvm/hyp/nvhe/tlb.c | 30 ++++++++ > arch/arm64/kvm/hyp/pgtable.c | 90 +++++++++++++++++++--- > arch/arm64/kvm/hyp/vhe/tlb.c | 23 ++++++ > arch/arm64/kvm/mmu.c | 15 +++- > arch/mips/include/asm/kvm_host.h | 4 +- > arch/mips/kvm/mips.c | 12 +-- > arch/riscv/kvm/mmu.c | 6 -- > arch/x86/include/asm/kvm_host.h | 7 +- > arch/x86/kvm/mmu/mmu.c | 25 ++---- > arch/x86/kvm/mmu/mmu_internal.h | 3 - > arch/x86/kvm/x86.c | 2 +- > include/linux/kvm_host.h | 20 +++-- > virt/kvm/Kconfig | 3 - > virt/kvm/kvm_main.c | 35 +++++++-- > 21 files changed, 290 insertions(+), 131 deletions(-) > > -- > 2.41.0.455.g037347b96a-goog >