Re: [PATCH v3 17/17] KVM: x86/tdp_mmu: Take root types for kvm_tdp_mmu_invalidate_all_roots()

Yan Zhao <yan.y.zhao@xxxxxxxxx> · Fri, 21 Jun 2024 15:10:42 +0800

On Wed, Jun 19, 2024 at 03:36:14PM -0700, Rick Edgecombe wrote:
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 630e6b6d4bf2..a1ab67a4f41f 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -37,7 +37,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
>  	 * for zapping and thus puts the TDP MMU's reference to each root, i.e.
>  	 * ultimately frees all roots.
>  	 */
> -	kvm_tdp_mmu_invalidate_all_roots(kvm);
> +	kvm_tdp_mmu_invalidate_roots(kvm, KVM_VALID_ROOTS);
all roots (mirror + direct) are invalidated here.

>  	kvm_tdp_mmu_zap_invalidated_roots(kvm);
kvm_tdp_mmu_zap_invalidated_roots() will zap invalidated mirror root with
mmu_lock held for read, which should trigger KVM_BUG_ON() in
__tdp_mmu_set_spte_atomic(), which assumes "atomic zapping don't operate on
mirror roots".

But up to now, the KVM_BUG_ON() is not triggered because
kvm_mmu_notifier_release() is called earlier than kvm_destroy_vm() (as in below
call trace), and kvm_arch_flush_shadow_all() in kvm_mmu_notifier_release() has
zapped all mirror SPTEs before kvm_mmu_uninit_vm() called in kvm_destroy_vm().

kvm_mmu_notifier_release
  kvm_flush_shadow_all
    kvm_arch_flush_shadow_all
      static_call_cond(kvm_x86_flush_shadow_all_private)(kvm);
      kvm_mmu_zap_all  ==>hold mmu_lock for write
        kvm_tdp_mmu_zap_all ==>zap KVM_ALL_ROOTS with mmu_lock held for write

kvm_destroy_vm
  kvm_arch_destroy_vm
    kvm_mmu_uninit_vm
      kvm_mmu_uninit_tdp_mmu
        kvm_tdp_mmu_invalidate_roots ==>invalid all KVM_VALID_ROOTS
        kvm_tdp_mmu_zap_invalidated_roots ==> zap all roots with mmu_lock held for read

A question is that kvm_mmu_notifier_release(), as a callback of primary MMU
notifier, why does it zap mirrored tdp when all other callbacks are with
KVM_FILTER_SHARED?

Could we just zap all KVM_DIRECT_ROOTS (valid | invalid) in
kvm_mmu_notifier_release() and move mirrord tdp related stuffs from 
kvm_arch_flush_shadow_all() to kvm_mmu_uninit_tdp_mmu(), ensuring mmu_lock is
held for write?

>  
>  	WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));