Today "zapping only memslot leaf SPTEs" on moving/deleting a memslot is not done. Instead, KVM opts to invalidate all page tables and generate fresh new ones based on the new memslot layout (referred to as "zap all" for short). This "zap all" behavior is of low overhead for most use cases, and is adopted primarily due to a bug which caused VM instability when a VM is with Nvidia Geforce GPU assigned (see link in patch 1). However, the "zap all" behavior is not desired for certain specific scenarios. e.g. - It's not viable for TDX, a) TDX requires root page of private page table remains unaltered throughout the TD life cycle. b) TDX mandates that leaf entries in private page table must be zapped prior to non-leaf entries. c) TDX requires re-accepting of private pages after page dropping. - It's not performant for scenarios involving frequent deletion and re-adding of numerous small memslots. This series therefore introduces the KVM_X86_QUIRK_SLOT_ZAP_ALL quirk, enabling users to control the behavior of memslot zapping when a memslot is moved/deleted. The quirk is turned on by default, leading to invalidation/zapping to all SPTEs when a memslot is moved/deleted. Users have the option to turn off the quirk. Doing so will limit the zapping to only leaf SPTEs within the range of memslot being moved/deleted. This series has been tested with - Normal VMs w/ and w/o device assignment, and kvm selftests - TDX guests. Memslot deletion typically does not occur without device assignment for a TD. Therefore, it is tested with shared device assignment. Note: For TDX integration, the quirk is currently disabled via TDX code in QEMU rather than being automatically disabled based on VM type in KVM, which is not safe. A malfunctioning QEMU that fails to disable the quirk could result in the shared EPT being invalidated while the private EPT remains unaffected, as kvm_mmu_zap_all_fast() only targets the shared EPT. However, current kvm->arch.disabled_quirks is entirely user-controlled, and there is no mechanism for users to verify if a quirk has been disabled by the kernel. We are therefore wondering which below options are better for TDX: a) Add a condition for TDX VM type in kvm_arch_flush_shadow_memslot() besides the testing of kvm_check_has_quirk(). It is similar to "all new VM types have the quirk disabled". e.g. static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm) { return kvm->arch.vm_type != KVM_X86_TDX_VM && kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL); } b) Init the disabled_quirks based on VM type in kernel, extend disabled_quirk querying/setting interface to enforce the quirk to be disabled for TDX. Patch 1: KVM changes. Patch 2-5: Selftests updates. Verify memslot move/deletion functionality with the quirk enabled/disabled. Yan Zhao (5): KVM: x86/mmu: Introduce a quirk to control memslot zap behavior KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled KVM: selftests: Allow slot modification stress test with quirk disabled KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled KVM: selftests: Test private access to deleted memslot with quirk disabled Documentation/virt/kvm/api.rst | 6 ++++ arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/mmu/mmu.c | 36 ++++++++++++++++++- .../kvm/memslot_modification_stress_test.c | 19 ++++++++-- .../testing/selftests/kvm/memslot_perf_test.c | 12 ++++++- .../selftests/kvm/set_memory_region_test.c | 29 ++++++++++----- .../kvm/x86_64/private_mem_kvm_exits_test.c | 11 ++++-- 8 files changed, 102 insertions(+), 15 deletions(-) base-commit: dd5a440a31fae6e459c0d6271dddd62825505361 -- 2.43.2